This simple job just computes π to 2000 places and prints it out. It takes around 10s to complete.
In case you're wondering why Kubernetes offers both Jobs and CronJobs features, the answer is that Jobs and CronJobs do similar, but different, things:
So, you'd typically create a Job if you need to run a specific operation (like executing a script that cleans up a database), whereas CronJobs are useful for regularly scheduled maintenance tasks (like performing periodic backups).
This example executes a db backup image every midnight
The importance of monitoring Jobs and CronJobs – and why it's hard
Given that Jobs and CronJobs are often used to perform critical administration or maintenance tasks, it's important to have visibility into tasks that you run using these Kubernetes features. You'll want to know if a backup that you scheduled via a CronJob fails, for instance, or if issues with one Job are causing another Job that depends on the first job to take longer than expected.
Unfortunately, achieving this visibility is not particularly easy. Although it's simple enough to define and run Jobs and CronJobs, it's harder to monitor them. The main reason why, as we noted above, is that most Kubernetes monitoring tools aren't designed with Jobs and CronJobs in mind. They cater instead to objects associated with actual workloads, like Deployments and StatefulSets.
This means not only that it's harder to get monitoring data related to Jobs and CronJobs, but also that answering relevant questions about them can be tricky. With objects like Deployments or StatefulSets, you typically want to know things like "do we have the expected number of ready Pods" or "how long does it take for Pods to become ready." Those are different sorts of questions from the ones you'd care about when dealing with Jobs and CronJobs. In the latter context, knowing which tasks are running, whether any have failed and how the failure of one task impacts other tasks is more important.
To put this another way, monitoring Jobs and CronJobs is less about understanding the ongoing state of Pods and their resource utilization. It's more about keeping track of individual operations that take place behind the scenes on a periodic basis.
Approaches to monitoring Jobs and CronJobs
Fortunately, there are a couple of viable approaches to monitoring Jobs and CronJobs.
Using Prometheus
One is to use Prometheus to push metrics about Job and CronJob operations. This strategy lets you keep track not just of simple success/failure outcomes, but also performance and resource utilization.
The downside is that you have to write custom code (like this Python code) to push the metrics. You must also explicitly configure a push gateway location and update it whenever it changes. So, there’s a lot of work in terms of both upfront effort and ongoing maintenance if you want to use Prometheus for monitoring your Jobs and CronJobs.
Using Kube-state-metrics
Alternatively, you can use Kube-state-metrics, a straightforward service that listens to the Kubernetes API server, then generates metrics regarding the state of objects, including Jobs and CronJobs.
This approach lets you pull a variety of useful metrics, such as job start and complete times and job failures.
But here again, you have to customize your monitoring tooling to display and analyze the right metrics. Few existing Kubernetes monitoring or observability platforms are built with Jobs and CronJobs in mind, so you can't simply turn them on and expect to stay on top of the most relevant metrics automatically.
Toward a better future for Jobs monitoring
While tracking Jobs and CronJobs in Kubernetes may not be as simple today as many admins would like, there's reason to hope it will improve going forward as teams make wider use of monitoring tools that are truly Kubernetes-native.
Kubernetes-native monitoring tools, like groundcover, make it possible to collect relevant data about Jobs and CronJobs – as well as any other type of Kubernetes object – using services, like Kube-state-metrics, that are native to Kubernetes. This approach avoids the complex setup and management effort required to pull metrics using custom code. It also typically leads to more efficient data collection, because collecting metrics in a K8s-native way generally consumes fewer resources.
Kubernetes-native monitoring of Jobs and CronJobs is already possible, as we showed above. What's needed in order for organizations to take full advantage of the process, however, is broader recognition of the importance of making metrics related to Jobs and CronJobs first-class citizens within Kubernetes observability. Deployments, StatefulSets and other workload-centric objects are critical to monitor, too, but they're not the only thing that matters within your Kubernetes cluster. If you use Jobs and CronJobs, you need continuous visibility into their operations as well.