Sometimes, a Kubernetes pod might look perfectly healthy from the outside, in the sense that it’s running normally. But inside, the pod’s containers could be failing, and you wouldn’t know it unless you ran a check to verify whether the pod’s containers were operating as expected.

This is where liveness probes come in. As their name implies, liveness probes let you determine whether a container is “alive” – or more specifically, whether the application that the container hosts is working the way it’s supposed to.

By providing visibility into what’s actually happening inside a pod, probes play an important role in Kubernetes monitoring and observability. To help you make the most of this type of Kubernetes liveness check, this article explains how they work, the different types of liveness probes, common probe issues, and best practices for working with Kubernetes liveness checks.

What is a Kubernetes liveness probe?

In Kubernetes, a liveness probe is an action that checks whether a container is up and running normally. Kubelet (the part of Kubernetes that manages containers on each node) uses liveness probes to determine whether it needs to restart any containers.

A diagram of a Kubernetes cluster showing a service managing two pods. It illustrates health checks (/health, /ready) and failures (Liveness fail, Readiness fail).

Most liveness probes work by issuing some type of request to a container (such as asking it to run a command or submitting an HTTP request) and monitoring the response. If the response is successful (meaning there is no error code or unexpected output), Kubelet assumes that the container is working normally. Otherwise, Kubelet restarts the container.

Liveness checks are important because situations can arise where a pod appears to be in a healthy state (based on output from commands like kubectl get pods) even though one or more containers within the pod are failing. This can happen because kubectl doesn’t directly check the status of containers; it only checks on pods. To figure out what’s actually happening to the containers inside a pod, you need to use a liveness check, which allows you to execute commands inside individual containers.

Liveness probes vs. readiness probes vs. startup probes

| Probe type | Purpose | |---|---| | Startup probes | Check whether a container has started. | | Readiness probes | Check whether a container is fully up and able to receive traffic. | | Liveness probes | Check whether a container is still running normally. |

Liveness checks are one of three main types of probes that Kubernetes supports. The others – readiness and startup probes – do the following:

  • Startup probes, which check whether a container has started. These are the first type of probe that Kubernetes runs for a container.
  • Readiness probes, which check whether a container is ready to receive traffic. Readiness probes come after startup probes. A readiness probe fails if a container is running but fails to respond to traffic. Liveness and readiness probes are similar in that they both check whether a container can handle traffic or a request, but the checks take place at different points in the container’s life cycle.

Liveness probes are similar to startup probes and Kubernetes readiness probes in that all of these probes are a way of checking the status of containers inside a pod. However, whereas startup probes and readiness probes monitor the status of a container as it’s in the process of starting up and getting ready to receive traffic, liveness checks monitor the status of an application on a periodic basis once it is fully up and running.

This means liveness probes allow you to catch issues like error events that have caused a container to crash even though it started up successfully and was working normally for some period of time. Startup probes and readiness probes wouldn’t catch this issue.

Benefits of using Kubernetes liveness probes

| Benefit | Why it matters | |---|---| | Improved application availability | Probes help keep applications running by triggering automated restarts. | | Efficient, scalable health checks | Probes can check container health quickly and with no manual effort on the part of admins. | | Automated restarts | Probes automatically trigger container restarts, further reducing manual effort on the part of admins. | | Customizability | Probes can be customized for each container. |

Because liveness probes offer an efficient means of checking on the internal status of a container, they provide several key benefits:

  • Improved application availability: Liveness probes help to detect containers that have failed, which in turn allows Kubelet to restart them and minimize application unavailability.
  • Efficient, scalable health checks: Since liveness probes run automatically, they are a quick and efficient way of monitoring the status of many containers within your Kubernetes cluster.
  • Automated restarts: Liveness checks also help to automate the process of restarting containers if they are not operating normally. This is a benefit because it means you don’t have to check the probe output and react manually; Kubernetes gets your container back up and running on its own.
  • Customizability: Liveness probes are easy to customize. You could configure checks to be more aggressive for a container that requires high availability, for example, while being more lax when checking less critical containers.

Liveness probe limitations

While liveness probes are a handy way to monitor the status of containers, they have their limitations, too.

The biggest is that liveness probes provide very little, if any, insight into why a container is not working normally. They simply indicate that there’s a problem, and they prompt Kubelet to restart the container. This is helpful if your container experiences a rare or one-off failure. But in the case of more complex issues, like buggy code inside a container that causes it to fail continually, liveness probes won’t solve the problem. They’ll just generate repeated crash-and-restart loops.

Another limitation of liveness probes is that they don’t really assess the performance of a container. They check whether the container can respond to a command or request, but not how quickly it responds or whether it’s able to operate normally under heavy load. Nor do they collect Kubernetes metrics of any type to help assess a container’s performance level.

For these reasons, you shouldn’t treat liveness probes as a type of Kubernetes performance monitoring technique. Liveness probes don’t monitor or help manage performance as much as they track, in a simple yes-or-no fashion, whether containers are running normally.

Types of Kubernetes liveness probes

Kubernetes liveness probes can be categorized based on the type of request used to verify whether the container is operating normally. The four options include:

  • Command execution: The probe runs a command inside the container, such as attempting to create a file using touch or to read a file using ls. If the command exits with a code of 0 (which is an exit code in Linux that means no error occurred), Kubelet assumes the container is healthy.
  • TCP socket: The probe attempts to open a connection to a TCP port on the container. A successful connection means the probe succeeds. 
  • HTTP GET: The command issues an HTTP request to a URL inside the container. A response that includes a status code in the range of 200-399 indicates that the request was successful.
  • gRPC: Uses the gRPC health checking protocol to issue a Kubernetes health check using gRPC. To use this type of liveness probe, your application must support the health checking protocol, and you must be using Kubernetes version 1.23 or later.

How to configure Kubernetes liveness probes: Examples

You can configure probes using YAML code that defines which type of check to run. The code also sets liveness probe parameters such as how often the check should run. The process is straightforward, although it varies a bit depending on which type of probe (a command probe, a TCP probe, etc.) you want to run.

As an example, here’s a simple YAML that defines a probe using the command method:

livenessProbe:
  exec:
	command:
  	- cat
  	- /tmp/healthy
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3
  successThreshold: 1
  timeoutSeconds: 2

This configures a probe that runs the command cat /tmp/health inside the container. It also:

  • Configures an initial delay of 5 seconds, which means Kubernetes will wait 5 seconds before running the command.
  • Sets a liveness period of 10 seconds. This tells Kubernetes to run the probe every 10 seconds
  • Sets a failure threshold of 3 (which means the container should be considered to have failed if the command fails to run 3 or more times).

As another example, here’s a probe that uses the TCP method:

livenessProbe:
  tcpSocket:
	port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3
  successThreshold: 1
  timeoutSeconds: 2

This liveness probe configuration tells Kubernetes to attempt to connect to TCP port 8080. It also defines an initial delay of 5 seconds, runs the check every 10 seconds, and sets a failure threshold of 3.

Here’s an example of a probe that uses HTTP:

livenessProbe:
  httpGet:
	path: /healthz
	port: 8080
	scheme: HTTP
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3
  successThreshold: 1
  timeoutSeconds: 2

This probe issues an HTTP request to the URL /healthz and uses the initial delay, probe period, and failure threshold as the prior examples.

Finally, here’s an example of a gRPC probe:

livenessProbe:
  grpc:
	port: 50051
	service: my.grpc.Service
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3
  successThreshold: 1
  timeoutSeconds: 2

This issues a gRPC health check using the service running at my.grpc.Service on port 50051.

Advanced probe configurations

The examples above are basic liveness checks because each of them issues just one check. To run a more advanced probe, you can configure one that performs multiple checks in the same probe. It’s also possible to perform different types of checks within the same probe.

For example, consider the following probe:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: multi-step-liveness
spec:
  replicas: 1
  selector:
	matchLabels:
  	app: my-app
  template:
	metadata:
  	labels:
    	app: my-app
	spec:
  	containers:
    	- name: my-app
      	image: my-app-image:latest
      	livenessProbe:
        	exec:
          	command:
            	- /bin/sh
            	- -c
            	- /app/liveness-check.sh
        	initialDelaySeconds: 10
        	periodSeconds: 10
      	volumeMounts:
        	- name: script-volume
          	mountPath: /app
  	volumes:
    	- name: script-volume
      	configMap:
        	name: liveness-script

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: liveness-script
data:
  liveness-check.sh: |
	#!/bin/sh

	# Check if the main process is running
	if ! pgrep my-app > /dev/null; then
  	echo "Process not running"
  	exit 1
	fi

	# Check if HTTP server is responding
	if ! curl -sf http://localhost:8080/healthz > /dev/null; then
  	echo "App is not responding"
  	exit 1
	fi

	# Check if lock file exists (indicating a stuck state)
	if [ -f "/tmp/app.lock" ]; then
  	echo "App is stuck due to lock file"
  	exit 1
	fi

	echo "Liveness check passed"
	exit 0

This runs a custom script (liveness-check.sh) that performs a multi-step health check using multiple techniques:

  • First, it uses the pgrep command to check whether the process my-app is running.
  • Next, it issues an HTTP request to the URL /healthz on port 8080.
  • Finally, it checks whether the file /tmp/app.lock exists within the container.

If each of the checks succeeds, the script exits with code 0.

How to test probes in Kubernetes

A diagram explaining Kubernetes liveness probes, showing how Kubelet checks if a container is alive. If the probe is OK, the container runs; if Not OK, the container restarts.

There is no built-in feature in Kubernetes for testing a probe to verify that it works as expected before it runs as an actual probe. However, you can manually test a probe by executing the command or request defined in the probe.

The simplest way to do this is to open a shell in the container you want to probe, and then manually execute the command or connection request you’ve configured for your probe. If the command or request succeeds, then you know the container will respond normally to the probe.

Testing a probe before deploying it can be useful because unexpected issues could mean that a liveness probe fails. For example, a probe that uses the touch command to create a file might fail because of file system permission settings inside the container, or a gRPC probe could fail because your container isn’t running the gRPC health check service on the right port.

How to check liveness probe logs in Kubernetes

You can also get feedback about liveness checks using logs.

If a liveness probe fails, Kubernetes will record the failure as an event associated with the container’s pod. You can view pod events by running:

kubectl describe pod <pod-name>

Output for probes that succeed isn’t recorded anywhere by default. However, it will be recorded if you set the Kubelet log level to 4 or above. The way to change the Kubelet log level varies between Kubernetes distributions, but it typically involves editing a file with a name that includes logging.conf inside the directory /etc/systemd/system/kubelet.service.d.

Common issues with Kubernetes liveness checks (and how to fix them)

Several issues may arise when executing probes on Kubernetes. Common challenges include the following.

Probes that fail unexpectedly

Sometimes, a probe fails even though a container is healthy. The most common reason is that the probe is misconfigured in one of the following ways:

  • The initial delay period is too short, causing the probe to fail because you’re not giving the container enough time to get ready before running the probe.
  • The command or request that you use in the probe fails because the container can’t successfully handle it. As mentioned above, this may be due to issues like file system permissions or because the container doesn’t actually expose a port of service that you think is open.

The best way to debug failed probes is to open a shell in the container and attempt to run the probe command or request manually, then inspect the output. For example, if a touch command fails because of file system permission restrictions, you should see output on the CLI that mentions insufficient permissions. You can also check whether the command or request succeeds only if you wait a certain period before running it, which is a sign that the initial delay period needs to be extended to get the probe to work.

Startup delays

In some cases, probes run before your container has fully started up. The result may be that the probe fails because the container can’t yet respond to the probe’s command or request. 

The fix in this situation is typically to increase the initial delay period, giving the container more time to finish starting up.

Frequent container restarts

If your container frequently restarts due to liveness checks, it’s likely that you’re running the probes too frequently, causing some probes to fail and trigger restarts. This is especially likely to happen if you issue requests so frequently that the container drops some of them, with the result that even though some probes succeed, others fail, and the failures cause Kubelet to restart the container.

In this case, try decreasing the periodSeconds value of the probe in order to reduce the frequency of checks. You can also increase the failure threshold, which will increase the number of times that a check must fail before Kubelet restarts the container.

Best practices for working with liveness in Kubernetes

To leverage liveness checks to maximum effect, consider the following best practices:

  • Set realistic probe intervals: In configuring probe intervals, you should choose a setting that runs probes frequently enough to identify issues without too long of a delay, while at the same time avoiding running probes so frequently that some may fail or provide no useful feedback.
  • Use lightweight checks: Each probe command or request places a load on your container. For this reason, choose simple checks (like trying to create a file or checking whether it exists) to avoid overloading the container.
  • Test probes: As we mentioned, test probe commands or requests manually before deploying them automatically. Doing so helps you identify probes that will unexpectedly fail.
  • Update probes as applications change: When the code inside your container changes, you may need to update your probes, too, because the command or request you originally used may no longer work with the newer version of the container.
  • Use multi-step probes: By performing multiple checks, multi-step probes provide deeper validation that a container is functioning normally.
  • Combine probes with observability tools: Because probes don’t tell you why a container is not functioning normally, it’s important to use observability tools to gain the additional context you need to troubleshoot issues with failed containers.

Enhancing Kubernetes liveness probes with groundcover observability

That last note – about combining liveness probes with observability – is where groundcover comes in.

groundcover Kubernetes monitoring dashboard displaying logs, traces, CPU usage, and workload metrics with various charts, graphs, and performance indicators.

As a full-fledged observability platform for cloud-native environments, groundcover provides the detailed insights you need to debug container performance problems. With groundcover, you’ll know not just that a container has failed, but also why it has failed and what you can do to fix it.

Probes offer a first line of defense against application failures. But if you really want to fix container performance issues permanently, you need the insights only groundcover can deliver.

Liveness probes: A valuable tool

The bottom line: Although liveness probes are not a replacement for Kubernetes monitoring and observability tools, they are a valuable technique for helping to identify and resolve simple problems with containers. As such, they’re one essential ingredient – alongside startup probes, readiness probes and observability software – in a broader Kubernetes availability and performance optimization strategy.

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.

We care about data. Check out our privacy policy.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.