Sometimes, small oversights can trigger big problems. Case in point: Exit code 127, a type of error that arises in containers or Kubernetes when the system can't execute a specified command.

Often, exit code 127 issues boil down to simple typos in commands. But they can be triggered by a variety of other problems as well – and until you get to the root cause of code 127 errors, affected containers won't run properly.

To provide guidance on identifying and fixing code 127 as part of a Kubernetes troubleshooting strategy, this article walks through what this type of error means, what the potential causes are, and how to diagnose and fix exit code 127.

What is exit code 127?

In the context of Kubernetes and other container-based application environments, exit code 127 is a type of exit code that indicates that a command inside a container could not be executed.

For example, here's a container that fails to start because it can't successfully complete a command:

In this case, we deliberately created a container that would fail by following Docker's steps for building your first container image but making one small change: In the Dockerfile, we changed the CMD path to:

CMD ["node", "/src/index.js"]

Note the / character before the src/index.js string. This triggers a command-not-found problem because it specifies an incorrect path (/src/index.js instead of src/index.js) inside the image.

As we note below, this is just one of many possible types of scenarios that could result in code 127.

Although it's common to hear people refer to exit code 127 as a type of Kubernetes or Docker error code, it's actually a generic Linux error code generated by /bin/sh (the program responsible for executing most programs on a Linux-based system). The operating system passes the code onto Kubernetes or Docker, which then records a 127 exit code in the exit status section when a container shuts down.

10 common causes of exit code 127 errors in Kubernetes

At a high level, all code 127 events stem from the same basic issue: The operating system couldn't execute a command it was asked to execute. But the specific causes of the problem can vary.

Here's a look at the ten most common situations that trigger exit code 127.

#1. Incorrect command or command path

Due to typos or oversights, the command or command paths defined in a container could be wrong, making it impossible to execute a command.

For instance, imagine your container includes the following startup command:

As the screenshot shows, this command will fail to run because it includes three ampersands (i.e., the & figure) instead of two – and while two ampersands in a row are valid syntax inside a Linux shell, three are not. In this case, a simple typo could cause code 127.

The same issue would occur if you specify a command path that doesn't exist. For example, imagine that you call the following:

/bin/some-app

But in reality, some-app is stored in the /usr/bin directory, not in /bin. In this case, too, the command would be unexecutable because the binary it calls doesn't exist at the paths defined in a script.

#2. Command or binary not installed

In other cases, exit code 127 could occur because the binary you're trying to call is simply not installed at all.

You might try to execute python, for example, in a container image that doesn't have any Python binary installed, resulting in a "command not found" error and causing a container to exit with code 127.

#3. Missing dependencies

Missing dependencies could cause code 127 if a container tries to execute a binary that requires other binaries or libraries to be present on the system but they are not installed.

For example, imagine your container runs a command to compile code using GCC, the GNU compiler collection. GCC depends on a variety of other tools, like gawk and grep. If these are not installed, the GCC command might start to run, then fail because GCC can't find a dependency it needs.

In this case, the GCC command itself wouldn't trigger code 127, but the code would be triggered by a later command that GCC tries to execute.

Issues like this may seem rare because package managers generally would install dependencies when installing an application. However, keep in mind that dependencies could be removed later as part of efforts to slim down a container image. They might also not end up installed in the first place because of issues with the package manager, or because the container image includes binaries that were installed without use of a package manager.

#4. Missing shell interpreter

In some cases, commands may fail to execute properly due to missing or improperly configured shells. A "scratch" container (a type of minimalist container image) might not include a shell interpreter for executing a Bash script, for example, causing the script to fail even if the binaries it references are installed.

This type of issue could also arise if you have a shell installed in your container, but permissions configurations make it unusable.

#5. Shell script syntax errors

To execute commands, the Bash command line expects a certain type of syntax. Errors in syntax – such as a ( character in the wrong place or a lack of white space next to an operator, could result in syntax that the shell can't interpret properly, causing shell scripts to fail and producing an exit code 127 error.

It's worth noting, too, that shell syntax errors could arise if you have a container that is using a different shell than the one developers expected when writing shell commands. The Bash command line shell (sh) is the most commonly used shell program in Linux, but a variety of other shells – like the C shell (csh) and tcsh also exist. In general, these shells use the same syntax, but there are some nuanced differences between them – so syntax that works with a Bash script may not work with csh, for example.

#6. Insufficient permissions

The Linux permissions system can restrict which users are able to access a binary, as well as whether a binary is executable. Insufficient permissions in either of these areas could trigger exit code 127.

For example, you could have a binary installed at /usr/bin/some-app. But if the binary doesn't have executable permissions, it won't run.

#7. Image compatibility

Image compatibility issues could cause code 127 events in situations where a container image is not compatible with the operating system or runtime used to execute it. This is relatively rare because most containers work with most Linux-based systems and runtimes. But it could occur if, for example, you are running a 32-bit system and trying to execute a container compiled for a 64-bit system.

#8. Volume mount problems

Exit 127 could happen if a command references data isn't accessible because it's stored in a data volume that either is not mounted at all, or whose permissions make it unavailable.

For example, imagine your container runs the command:

cp /cache/some-file

If /cache is mapped to a volume that is not mounted (rather than an internal directory that exists in a container), the command will result in an error.

#9. Environment variable issues

Occasionally, environment variables in your shell environment could create issues that cause code 127.

A common example is if the $PATH system environment variable (which is typically used to specify the paths to directories that host common applications on a Linux system) is either undefined or doesn't point to the right locations. In that case, calling a binary without specifying its complete path (e.g., calling sh instead of /usr/bin/sh) would cause a command to fail because the system wouldn't be able to find the binary.

For example, here's what happens if you were to try to call a binary (in this case, the cp command) without a defined $PATH system environment variable:

#10. Kubernetes RBAC policy configurations

Misconfigurations with Kubernetes RBAC policies could cause exit code 127 if they result in situations where the resources necessary to complete a command (like storage volumes) aren't accessible or the service account you're using lacks permission to execute a command.

This is rarer than permissions issues related to settings inside a container itself, but Kubernetes RBAC is still a potential source of code 127 errors.

How to diagnose exit code 127 in Kubernetes

When a container shuts down with code 127, Kubernetes doesn't automatically alert you. You need to dig a bit to confirm that this type of error occurred.

Here's guidance on where to look to diagnose exit code 127.

Check Kubernetes Pod logs

Start by checking the logs for the Pod that was hosting the container that shut down. You can do this using the command:

kubectl logs pod-name

If the logs mention exit code 127, you can confirm that a command inside the container failed to execute.

Check Pod description

The Pod description might also mention exit code 127. You can check using:

kubectl describe pod pod-name

How to troubleshoot exit code 127

Once you've confirmed that an exit 127 event occurred, you can perform steps to gain additional context on why it occurred.

Test the container locally

Start by attempting to start the container locally using a command like:

docker container start container-name

If the container starts locally but not as part of your Kubernetes Pod, the issue most likely has to do with configuration settings in Kubernetes, not with the container itself. But if the container fails when run locally, it's a safe bet that the root of the problem lies with the container.

Examine Dockerfile

If you believe the problem is with the container, dig deeper by downloading the Dockerfile that was used to build the container's image and opening it up in a text editor. (Usually, Dockerfiles are stored in the same repositories that host container images, so you should be able to download them in the same place where you obtained your container image – such as Docker Hub.)

In particular, check the following components of the Dockerfile:

  • The base image: The base image determines which binaries and directories exist by default in your container. Ensure that you're using a stable, up-to-date base image. You could also try downloading and running the base image itself; this would tell you whether the base image is able to execute, and, hence, whether the root of your problem lies with the base image or with resources your container adds to it.
  • Container startup paths and commands: Make sure there are no typos or missing paths in the startup commands defined within the Dockerfile.
  • The container shell: Based on the base image or the startup commands, you should be able to tell which shell the container uses (or determine that there is no shell installed at all). Make sure the shell is compatible with any commands the container executes.

Check volumes and ConfigMaps

If you are able to execute the container locally, it's likely that the code 127 problem stems from a configuration issue with Kubernetes – including, potentially, a problem with storage volumes or ConfigMaps.

To inspect each of these types of resources, run the following commands for any volumes or ConfigMaps associated with the container:

kubectl describe pvc pvc-name
kubectl describe configmap config-name

How to fix code 127

Resolving code 127 requires fixing whatever the root cause of the issue turns out to be, based on the diagnosis and troubleshooting steps described above. Common resolutions include:

  • Correcting typos or syntax errors inside Dockerfiles, then rebuilding the container image.
  • Upgrading to a newer base image that provides the binaries or paths necessary to execute commands inside the container.
  • Modifying permissions settings inside the container to ensure commands execute properly.
  • Updating configuration settings in Kubernetes to address issues like inaccessible volumes or ConfigMaps.
  • Use an init container – meaning a container that starts up before another container – to create any resources or settings necessary to work around the code 127 error that affects another container. This is less ideal than fixing issues within a failed container to make it run properly, but an init container can be a good workaround in situations where you can't modify a container image.

Tips for avoiding and managing exit code 127

Even better than resolving exit code 127 is avoiding the issue in the first place. The following steps can help in this regard:

  • When writing commands to run inside containers, use a linter (like ShellCheck) to scan for syntax issues or typos that could prevent commands from executing.
  • Test images locally before deploying them on Kubernetes so that you can catch and isolate issues related to container images. Ideally, you'll test using the same operating system and container runtime that will host your container in production, to avoid potential incompatibility issues.
  • Ensure you're using the latest stable software version of your app based on the image tag you specify when pulling it. Outdated images might contain bugs that cause code 127 events.
  • Include commands in a container to resolve missing dependencies. In general, you can do this using a command such as apt-get install -f, which will automatically attempt to resolve dependencies for packages installed using apt.

Kubernetes troubleshooting with groundcoverAt groundcover, we can't guarantee that the containers you want to run will never fail to execute a command. But we can help you get to the root of the issue quickly when exit code 127 arises.

We do this by providing comprehensive visibility into all components of Kubernetes – containers, pods, nodes, the control plane, and more – based on all relevant Kubernetes metrics, logs, and other data sources. This means you can quickly identify problems like containers that have crashed and determine how widespread an outage is. With this information, you're primed to begin troubleshooting code 127 errors – or virtually any type of performance or availability problem that might arise in Kubernetes – efficiently and effectively.

Putting an end to 127 error events

The inability to run a command due to issues like typos or syntax errors is a common problem in Kubernetes and beyond. Fortunately, with the right diagnosis and troubleshooting process – and with help from effective Kubernetes monitoring tools – you can get to the root of problems like this, and stop letting exit code 127 get in the way of your applications' ability to run.

FAQS

Here are answers to common questions about CrashLoopBackOff

How do I delete CrashLoopBackoff Pod?

To delete a Pod that is stuck in a CrashLoopBackOff, run:

kubectl delete pods pod-name

If the Pod won't delete – which can happen for various reasons, such as the Pod being bound to a persistent storage volume – you can run this command with the --force flag to force deletion. This tells Kubernetes to ignore errors and warnings when deleting the Pod.

How do I fix CrashLoopBackoff without logs?

If you don't have Pod or container logs, you can troubleshoot CrashLoopBackOff using the command:

kubectl describe pod pod-name

The output will include information that allows you to confirm that a CrashLoopBackOff error has occurred. In addition, the output may provide clues about why the error occurred – such as a failure to pull the container image or connect to a certain resource.

If you're still not sure what's causing the error, you can use the other troubleshooting methods described above – such as checking DNS settings and environment variables – to troubleshoot CrashLoopBackOff without having logs.

Once you determine the cause of the error, fixing it is as easy as resolving the issue. For example, if you have a misconfigured file, simply update the file.

How do I fix CrashLoopBackOff containers with unready status?

If a container experiences a CrashLoopBackOff and is in the unready state, it means that it failed a readiness probe – a type of health check Kubernetes uses to determine whether a container is ready to receive traffic.

In some cases, the cause of this issue is simply that the health check is misconfigured, and Kubernetes therefore deems the container unready even if there is not actually a problem. To determine whether this might be the root cause of your issue, check which command (or commands) are run as part of the readiness check. This is defined in the container spec of the YAML file for the Pod. Make sure the readiness checks are not attempting to connect to resources that don't actually exist.

If your readiness probe is properly configured, you can investigate further by running:

kubectl get events

This will show events related to the Pod, including information about changes to its status. You can use this data to figure out how far the Pod progressed before getting stuck in the unready status. For example, if its container images were pulled successfully, you'll see that.

You can also run the following command to get further information about the Pod's configuration:

kubectl describe pod pod-name

Checking Pod logs, too, may provide insights related to why it's unready.

For further guidance, check out our guide to Kubernetes readiness probes.

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.

We care about data. Check out our privacy policy.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.