Exit Code 137: Causes & Best Practices to Prevent It

Memory is to containers what water is to people: a vital resource. Just as bad things start to happen if you don't drink water for a time, your containers will start experiencing errors if you don't supply them with enough memory.

Specifically, they'll probably experience exit code 137, which signals in most cases that Kubernetes killed a Pod due to an Out of Memory (OOM) killed error. When this happens, you need to get to the root of the problem so that you can get your Pod back up and running – and prevent OOM errors from recurring.

Keep reading for guidance as we explain everything you need to know about exit code 137 on Kubernetes.

What is exit code 137?

Exit code 137 is a code that Kubernetes uses to identify Pods that were shut down by the Linux kill signal, known as SIGIKLL or signal 9. Typically, code 137 happens due to a lack of sufficient memory, although it can also result from a failed health check. (We'll dive deeper into the causes of exit code 137 in the following section.)

The signal that results in code 137 isn't sent by Kubernetes itself. It's generated by the Linux kernel that runs on whichever node hosts a Pod. The Linux operating system kernel has the power to send a signal that tells workloads to shut down, and the kill signal is one such signal.

Common causes of exit code 137

Cause Explanation Common fix
Misconfigured memory limits and requests Poorly configured resource quotas deprive some Pods of sufficient memory. Adjust limit and/or request settings.
Memory leaks in the application Buggy application code causes inefficient use of memory. Optimize application code.
Node memory pressure The cluster lacks sufficient memory to support all workloads. Add nodes to the cluster.
Failed health check A failed health check results in the kill signal. Modify health check to remove commands that could trigger a kill signal.

Most of the causes of exit code 137 involve some type of memory issue, although the specific nature of the problem can vary. In addition, code 137 sometimes occurs for reasons not explicitly related to memory.

Here's a look at the four most common causes of exit code 137.

#1. Misconfigured memory limits and requests

In Kubernetes, memory limits define the maximum amount of memory that a container can use, and a request assigns a minimum amount. (You can learn more by checking out our article on Kubernetes requests vs. limits.)

This shows that the application's memory steadily increases over time until it reaches a limit and the system kills it, causing the app to restart and repeat the process.

  • You set a memory limit that doesn't give a container enough memory to operate normally, and the system shuts it down as a result.
  • You set memory requests for some containers that are too high and deprive other containers of the memory they need to operate normally. This can happen because even if a container is not actively using all of the memory assigned to it, that memory will still be reserved for the container – which means other containers that may need the memory won't be able to use it.

#2. Memory leaks in applications

Memory leaks cause applications to consume increasing amounts of memory. If a memory leak occurs for long enough, the application will run out of sufficient memory, causing the Linux operating system to send it the kill signal. If you graphed the application's memory usage, you'd see a series of spikes and crashes, as in the following graph:

This shows that the application's memory steadily increases over time until they reach a limit and the system kills them, causing the app to restart and repeat the process.

Leaks involving memory typically result from poorly written code that causes inefficient handling of memory by an application. Restarting an application often temporarily resolves a memory leak, but to fix it permanently, you'll need to debug your code and resolve the issue that causes inefficient use of memory.

#3. Node memory pressure

Node memory pressure occurs when the total memory available on a Kubernetes node runs short and kubelet attempts to free up memory by killing some resources.

The ultimate fix for memory pressure issues is to ensure that your nodes have sufficient memory to support all of your workloads. In the short term, however, adjusting Kubernetes limits and requests, or using DaemonSets to control which nodes host which Pods, can help to relieve node pressure.

#4. Failed health check

Kubernetes allows you to configure various types of health checks. Health checks are designed to confirm that a container is operating normally by, for example, checking whether it responds to a specific type of request. In some cases, failed health checks result in code 137. This scenario doesn't necessarily involve memory shortages; any type of health check that triggers the kill signal could cause this exit code.

To prevent this from happening, ensure that your health checks don't include commands that might cause a kill signal – such as scripts that use the kill command to terminate processes.

Diagnosing and troubleshooting exit code 137 in Kubernetes

If you suspect that your clusters have encountered an issue involving code 137, you can diagnose and troubleshoot using these steps.

Inspect logs

First, confirm that your Pod has registered exit code 137. You can do this by describing the Pod with a command like:

kubectl describe pod pod-name

If the resulting output mentions code 137, you know your Pod has been terminated for that reason.

Check events

After you've confirmed an exit code 137 occurrence, the first place to look to figure out why it happened is Kubernetes events. Do this by running:

kubectl get events

In the output, look for events that relate to memory issues, such as Pods that were Kubernetes OOMKilled. You can also look for mentions of Pod eviction, which can happen when node memory pressure issues cause Kubernetes to evict a Pod from a node.

Events typically won't tell you exactly why a Pod exited with code 137, but they'll provide additional context that gets you closer to the root cause.

Examine resource quotas and limits

You can correlate events data with an assessment of resource quotas and Kubernetes limits that you've defined for containers within the Pod that terminated. You should also check resource configurations for any other containers that are hosted on the same node as the terminated Pod since high resource consumption by those containers could affect other containers or Pods.

The easiest way to check resource quotas and limits is to read the manifests that you used to configure your Pods. Look for the requests: and limits: definitions. Unfortunately, Kubernetes doesn't provide an easy way of checking how much memory a Pod has been allocated by using kubectl to describe the Pod. The best thing to do is look at the manifests.

If you find poorly configured limits and requests, it's reasonable to assume that they caused exit code 137 and that changing the settings will prevent the issue from recurring.

Analyze application code

If your resource limits and requests don't appear problematic, you should inspect the code of your application to look for issues, particularly those that might cause a memory leak.

A full guide to testing applications for memory leaks is beyond the scope of this article. But suffice it to say that in most cases, you'll want to deploy load testing tools that evaluate how your application handles memory under varying conditions. Code debuggers may also help to pinpoint the place within the source code that triggers the leak.

After fixing the root cause of a memory leak, rebuild your container images and redeploy your application to prevent future exit code 137 events.

Best practices for preventing exit code 137

Even better than troubleshooting exit code 137 is preventing the issue from occurring at all by taking steps to avoid it, such as the following.

Monitoring and alerting

Monitoring your Kubernetes cluster and setting up alerts is a basic best practice for preventing exit code 137 issues. Monitoring and alerts allow you to identify instances like a container's memory usage suddenly spiking, or the near-exhaustion of the total memory available on a node.

By identifying these issues before a Pod receives the kill signal, you can take steps to fix the issue without it resulting in Pod termination.

Properly set memory requests and limits

Ensuring that you set memory requests and limits effectively is another basic best practice for preventing issues involving exit code 137.

There is no one-size-fits-all approach to follow here because every application's memory usage needs are different. However, as a basic best practice, you should define memory requests and limits only after testing how much memory your application actually consumes.

Keep in mind, too, that there is no requirement to define requests and/or limits at all. Sometimes, it's better to let Kubernetes manage memory allocation dynamically, especially if you're confident that there is enough memory to go around and that you won't have any "noisy neighbor" containers that suck up more memory than they should, depriving their neighbors of sufficient resources.

Application optimization

Optimizing your application also helps prevent issues where it uses memory inefficiently and triggers code 137.

Optimization can include changing application runtime settings, such as turning off unnecessary features in order to reduce memory usage. In addition, your developers can optimize code within the application itself to ensure that it uses memory as efficiently as possible. Here again, a full discussion of how to optimize source code for memory usage is beyond the scope of this article, but there are plenty of software development tools out there designed for this purpose.

Configure horizontal Pod autoscaling

Horizontal Pod autoscaling changes the number of replicas of a given Pod dynamically. This feature allows you to scale a Pod up when necessary to handle a spike in workload, then scale it back down automatically afterward to free up memory.

Horizontal pod autoscaling helps to prevent code 137 because it reduces the risk that you'll tie up memory by assigning it to Pods that don't actually need it. With horizontal autoscaling, memory is allocated dynamically based on actual requirements.

Configure vertical Pod autoscaling

Vertical Pod autoscaling can dynamically change the memory (along with other types of resources) allocated to a Pod. This means that instead of adding or removing Pod replicas (as you would with horizontal autoscaling), you can assign more or less memory to a specific Pod.

This is another way of helping to prevent code 137 by automatically assigning memory where it's needed most based on workload demand, which reduces the risk of tying up memory resources on Pods that don't have a need for them.

Other common exit codes and what they mean

Code Explanation
0 Success.
1 Container experienced an error.
125 Container execution command failed.
126 Command inside the container failed to execute.
127 Command inside the container was not found.
128 Generic code used when code terminates without specifying its own exit code.

A guide to code 137 in Kubernetes wouldn't be complete if we didn't mention the other main termination codes you might experience. These include:

  • Exit code 0: Indicates success, meaning no errors occurred.
  • Exit code 1: Occurs when a container experiences an error, usually due to bugs inside the container code or a problem with a container image.
  • Exit code 125: Means that the command used to execute a container failed for some reason.
  • Exit code 126: Happens when a command inside a container fails to execute, typically due to issues like missing dependencies or scripting errors.
  • Exit code 127: A "command not found" error indicating that Kubernetes couldn't execute a command or binary, often due to an issue like missing binaries within the runtime environment.
  • Exit code 128: Happens when code inside a container causes an exit event without registering a specific exit code. Kubernetes uses the generic exit code of 128 in this case.

We dive deeper into these exit codes in other articles. Check them out to gain perspective on what causes the various types of errors that a Pod may experience, and to understand how code 137 compares to other common types of exit events.

Troubleshooting Kubernetes errors with groundcover

No matter which exit code your containers generate, groundcover has you covered when something goes wrong with Kubernetes. Using eBPF, groundcover collects low-level data from all of your containers, Pods, and nodes to help you pinpoint the cause of performance issues, including exit 137 events.

Not only that, but groundcover also provides the monitoring and alerting features you need to get ahead of memory management issues in Kubernetes. This means you can detect problems like memory leaks and node resource pressure before they result in errors.

Exiting from code 137

In a perfect world – or a perfect Kubernetes cluster, at least – code 137 events would never happen. Making that world possible boils down to understanding how to manage memory effectively in Kubernetes, as well as how to detect memory issues proactively. With the right configurations and tools on your side, you can say goodbye to exit code 137.

Check out our Kubernetes Troubleshooting Guide for more errors -->

FAQS

Here are answers to common questions about CrashLoopBackOff

How do I delete CrashLoopBackoff Pod?

To delete a Pod that is stuck in a CrashLoopBackOff, run:

kubectl delete pods pod-name

If the Pod won't delete – which can happen for various reasons, such as the Pod being bound to a persistent storage volume – you can run this command with the --force flag to force deletion. This tells Kubernetes to ignore errors and warnings when deleting the Pod.

How do I fix CrashLoopBackoff without logs?

If you don't have Pod or container logs, you can troubleshoot CrashLoopBackOff using the command:

kubectl describe pod pod-name

The output will include information that allows you to confirm that a CrashLoopBackOff error has occurred. In addition, the output may provide clues about why the error occurred – such as a failure to pull the container image or connect to a certain resource.

If you're still not sure what's causing the error, you can use the other troubleshooting methods described above – such as checking DNS settings and environment variables – to troubleshoot CrashLoopBackOff without having logs.

Once you determine the cause of the error, fixing it is as easy as resolving the issue. For example, if you have a misconfigured file, simply update the file.

How do I fix CrashLoopBackOff containers with unready status?

If a container experiences a CrashLoopBackOff and is in the unready state, it means that it failed a readiness probe – a type of health check Kubernetes uses to determine whether a container is ready to receive traffic.

In some cases, the cause of this issue is simply that the health check is misconfigured, and Kubernetes therefore deems the container unready even if there is not actually a problem. To determine whether this might be the root cause of your issue, check which command (or commands) are run as part of the readiness check. This is defined in the container spec of the YAML file for the Pod. Make sure the readiness checks are not attempting to connect to resources that don't actually exist.

If your readiness probe is properly configured, you can investigate further by running:

kubectl get events

This will show events related to the Pod, including information about changes to its status. You can use this data to figure out how far the Pod progressed before getting stuck in the unready status. For example, if its container images were pulled successfully, you'll see that.

You can also run the following command to get further information about the Pod's configuration:

kubectl describe pod pod-name

Checking Pod logs, too, may provide insights related to why it's unready.

For further guidance, check out our guide to Kubernetes readiness probes.

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.

We care about data. Check out our privacy policy.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.