Kubernetes
Yechezkel Rabinovich • Jul 2, 2024

Kubernetes Requests and Limits: Learn How They Work

Explore how requests and limits work in Kubernetes, how they compare to each other, and how to use them to manage Kubernetes resources.

Kubernetes Requests and Limits: Learn How They Work
Yechezkel Rabinovich
Yechezkel Rabinovich
July 2, 2024
May 24, 2026
7
min read
Kubernetes

Last Updated: May 24, 2026

Kubernetes resource management is a little like children around candy: If you don't carefully limit how many resources a container can consume, it may go overboard – like a kid who consumes a family-sized bag of Skittles in one sitting.

On the other hand, depriving containers of access to the minimum volume of resources they need to run properly is akin to telling your kids they can never indulge in treats. They'll be miserable and underperforming (we're talking here about containers, not children, of course).

This is why setting appropriate Kubernetes requests and limits is so important. By understanding the role that requests and limits play in Kubernetes resource and performance management, and determining when and how to set requests and/or limits, you can ensure that each workload has the right amount of resources – no more and no fewer – to do its job.

Keep reading for a dive into how requests and limits work in Kubernetes, how they compare to each other, and how to use them to manage resources. Understanding Kubernetes resource management and its importance can be mission-critical to your organization. Let's begin by going over the basics of Kubernetes resource management.

The main purpose of Kubernetes is to orchestrate workloads across a cluster of servers. By default, Kubernetes has no way of knowing how many CPU, memory, or other resources each workload needs, nor does it try to control how many resources each workload can access. If there's available CPU or memory on the node that hosts a Pod, Kubernetes will go ahead and let the Pod consume that resource if it wants.

This approach to resource management is fine as long as there's enough CPU and memory available to keep all Pods operating normally. But if one Pod or container tries to hog resources, or if total resource availability decreases due to issues like nodes being removed from a cluster, problems can ensue. Workloads may begin experiencing errors or taking a long time to handle requests because they don't have enough CPU, leading to CPU throttling. Likewise, lack of sufficient memory could lead to issues like workloads being OOMKilled. Kubernetes may also try to migrate Pods repeatedly from one node to another as nodes run out of spare resources, which could lead to disruptions in application availability.

Fortunately, Kubernetes offers features – requests and limits – to help admins get ahead of issues like these and ensure that resources are distributed in an optimal way across a cluster.

Kubernetes Requests vs Limits: What is a Kubernetes Request?

A Kubernetes request is a minimal amount of resources that Kubernetes makes available to a specific container.

For example, if you set a memory request of 128 megabytes and a CPU request of 750 millicores (more on what a millicore means in a moment) for a container, Kubernetes would ensure that the container always has at least this many CPU and memory resources available to it.

The container can consume more resources, too – up to any limits you establish, as explained below. A memory or CPU request simply sets a minimum level of resource availability.

You define requests within the resources section of a container's spec. For example:

This requests 128 megabytes of memory and 750 millicores of CPU.

Request units: What's a CPU millicore?

Measuring memory requests based on megabytes is simple enough, but the concept of millicores may be less familiar.

In Kubernetes, a CPU millicore is one-thousandth of what Kubernetes calls a CPU unit. A CPU unit is one physical or virtual CPU, one of the cluster's measurable compute resources, and CPU requests use these units to reserve a share of cpu time for a container.

The exact amount of computing power that these units represent can vary because different CPUs have different levels of capacity – so 1000 CPU millicores on one node might provide more or less processing power than 1000 millicores on a different node, unless each node has identical CPUs. Still, millicores provide an approximate way to compare and assign CPU capacity across workloads.

How CPU and memory requests influence Pod placement: The art of scheduling

When Kubernetes schedules Pods (meaning it decides which node should host a Pod), it factors in requests defined for the Pod. The Kubernetes scheduler compares pod requests against node capacity and available resources before placing a Pod. It won’t assign a Pod to a node if the node lacks enough spare resources to meet the Pod’s requests.

If no Pods are available that meet the requests, Kubernetes won’t schedule the Pod at all. The Pod will be stuck in the Pending state until a suitable node has available CPU or sufficient memory (or until you change the resource requests for the Pod).

This means that there's something of an art to balancing requests with Pod scheduling. You want to avoid situations where you set requests that are so high that your Pods either can't be scheduled at all, or that they consume available nodes inefficiently.

For instance, imagine that you have 10 nodes with 1 CPU unit each, and 10 Pods that you want to deploy across your nodes. If you were to set a request of 550 CPU millicores for each Pod, no node would be able to host more than one Pod at a time, and you’d likely underutilize your nodes because some nodes might sit with nearly half of their resources unallocated, assuming the Pods don’t consume significantly more resources than those assigned to them. Multiple Pods on the same node can also be blocked when those requests consume too much of that node’s allocatable capacity. (This is a simplistic scenario because in the real world, you probably wouldn’t have just one CPU per node, but you get the point.)

Practical considerations for setting effective requests

To avoid problematic scheduling scenarios, Kubernetes admins should set effective requests as part of sound Kubernetes resource management practices, based on the following:

  • Base requests on actual resource usage: Instead of merely guessing how many resources your workload will need, use Kubernetes monitoring tools to track its resource usage in an actual environment and base requests on actual usage patterns. Even if you only monitor in a dev/test environment, collecting Kubernetes metrics such as CPU and memory consumption will provide valuable insight into how many resources the workload uses when it operates – which may be different from what you expect it to use.
  • Consider each workload's purpose when setting requests: Critical workloads (like production apps) may need more aggressive requests than less important workloads.
  • Align requests with node resource availability: Consider the resources allocated on each node and how those allocations will impact the number of Pods that each node can host. Again, steer clear of scenarios where nodes are underutilized because requests make it impossible to schedule Pods across them efficiently.
  • Consider varying CPU capacity: Since the meaning of 1 CPU unit can vary across nodes, factor in the actual CPU capacity of each of your nodes when setting requests. Remember, too, that you can use DaemonSets if you want to force a Pod to run a specific node based on the type of CPU available on the node.

What is a Kubernetes limit?

Now that we know how requests work, let's move on to their alter-ego: Kubernetes limits.

In Kubernetes, a limit is the maximum amount of resources of a given type that a container can consume. For example, if you set a CPU limit of 1500 CPU millicores and a memory limit of 1024 megabytes for a container, Kubernetes will not allow the container to consume more resources than these.

Like requests, resources are defined in a container's spec. For example:

How memory and CPU limits prevent resource overconsumption and maintain a stable cluster

Memory and CPU limits are important because cpu and memory limits help restrict access to shared resources when one container tries to consume so many that there aren’t enough left to support other containers. As we mentioned above, this can lead to situations where applications fail or underperform due to a lack of adequate resources. It can also trigger undesired rescheduling of Pods that share a node with a resource-hungry container.

On Linux, these limits are enforced through the container runtime and kernel controls, so hitting CPU caps can lead to cpu throttling while memory pressure can trigger OOM-related behavior.

Of course, you don’t want to set memory and CPU limits that are so low that they deprive a container of the resources it needs to operate normally. Aggressive limits can also create performance issues and, in throttling scenarios, end up causing performance degradation. Your goal should be to strike a healthy balance that gives a container access to a reasonable amount of resources, while still leaving sufficient resources available to other workloads.

Setting smart limits: Strategies for optimizing resource utilization

The following strategies can help to set appropriate limits:

  • Decide which workloads to prioritize: As with memory and CPU requests, some workloads may require different limits than others, based on how important they are. It makes more sense to set high limits for mission-critical workloads, as compared to ones whose failure you can tolerate, and careful resource settings influence how safely those workloads can burst.
  • Consider cluster scalability: If your cluster has autoscaling features enabled that allow it to add nodes quickly when total resource availability becomes constrained, there is less risk associated with setting high limits than there would be if your cluster’s total resource capacity is fixed and unscalable, because scaling can supply additional resources quickly.
  • Assign limit ranges: Limit ranges are a Kubernetes feature that can restrict resource usage levels on a per-namespace (as opposed to per-container) basis. Limit ranges can help manage all the resources in a namespace and ensure sensible limits set for new containers, while also providing a safeguard against high limits for individual containers, since the containers won’t be allowed to consume more resources than the CPU or memory limit range for their namespace, even if the total container resource limits are higher.

Kubernetes resource requests vs. limits: Key differences

Kubernetes memory and CPU requests and limits are similar in that both are part of managing compute resources in a kubernetes cluster. However the obvious difference between them is that resource requests establish the minimum resources available to an application, while limits control the maximum.

It’s important to note, too, that you don’t have to use resource requests and limits together. You can use them independently by, for example, defining a CPU or memory request for a container but no limit, even though Kubernetes typically treats limits requests as the pair of values it uses for scheduling and enforcement. Doing so would make sense in scenarios where you want to ensure that a mission-critical workload always has a minimum amount of resources available, but you are not concerned about it being over-consuming. Requests influence the resources allocated for placement, while limits cap maximum usage at runtime.

Why both are essential: A harmonious balance

That said, it’s generally best to set both requests and limits. Even if you don’t expect a workload to consume an unreasonable amount of resources, good memory management still matters because issues like memory leaks or code can lead to unanticipated resource consumption spikes. Limits offer a backstop that can prevent cluster performance or stability issues.

Likewise, setting requests is a smart move in most cases because it ensures that your containers will never have insufficient resources to guarantee a minimum level of performance, and avoiding strict setting cpu limits can sometimes help bursty workloads use more available CPU when demand briefly rises. Problems like failed nodes may unexpectedly decrease the resources available to containers, and requests can help ensure that containers are rescheduled in the most efficient way. This balance matters because a container may need much cpu temporarily, but requests still protect minimum availability.

Common issues caused by request and limit misconfigurations

A variety of issues can arise if you misconfigure requests and limits in Kubernetes. The following table discusses the most common.

Problem Cause Ways to fix
Pods fail to schedule due to requests that are too high. Insufficient resources are available to meet requests. • Lower requests.
• Add more nodes.
Pods evicted or OOMKilled. High resource consumption leads to insufficient resource availability, causing rescheduling or killing of Pods. • Set limits to prevent excess resource consumption by individual containers.
• Set limit ranges to manage resource consumption across namespaces. Add nodes.
Nodes overloaded or unresponsive. Node CPU or memory resources are maxed out because containers are using too many resources. • Set limits to restrict container resource consumption.
• Use DaemonSets to move resource-hungry containers to nodes that have more resources.
Applications experiencing performance degradation or latency issues. Lack of sufficient resources causes application performance problems. • Set requests that guarantee adequate resources.
• Set limits to prevent other applications from depriving their neighbors of sufficient resources.
Unexpected resource consumption spikes (that can't be explained by an increase in workload traffic). Application bugs (such as memory leaks) trigger a surge in resource consumption. • Set limits to restrict how many resources applications can use.
• Identify and fix the underlying application issue that causes unanticipated resource consumption.

Many of these issues can stem from other root causes, too, not just improper CPU or memory requests and limit settings. For example, there are a variety of reasons why a node might become unresponsive, such as network connectivity problems or failures within the node's operating system. But in many cases, even if the root cause of a problem is not CPU or memory requests or limit settings, you can use requests and limits to help mitigate the effects of the issue.

Best practices for setting Kubernetes requests and limits

The following best practices for working with requests and limits in Kubernetes can help avoid issues like those described above:

  • Align settings with workload priority levels: As we mentioned, it typically makes sense to allocate more resources to high-importance workloads, while being more conservative for lower-priority Pods.
  • Review and update requests and limits: The application needs to evolve over time in response to changes like fluctuations in traffic. The requests and limits you initially set for an app may not be appropriate in the future. Periodically review actual resource consumption data, compare it with actual usage patterns as resource consumption changes, and update settings as needed.
  • Manage limits at both the container and namespace level: Again, in addition to setting limits on a per-container basis, you can use memory and CPU limit ranges to manage resource consumption within a namespace. Using both features is a best practice because namespace policies should account for available resources and node capacity across the cluster, helping establish multiple layers of protection against excess resource consumption.
  • Use auto-scaling strategically: Autoscaling features (which are available in certain Kubernetes distributions and services) can automatically add nodes to clusters, thereby increasing resource availability. Autoscaling helps protect against failures when workloads need additional resources beyond the currently available CPU and memory, so it makes sense to take advantage of it for critical workloads. However, because adding nodes increases the cost of operating Kubernetes, you shouldn’t use auto-scaling as a substitute for controlling resource consumption via effective limits.

Getting more from Kubernetes with requests and limits

You don't have to configure requests and limits if you use Kubernetes, just as you don't have to let your kids indulge in the occasional candy bar. But requests and limits are handy features for ensuring that resource consumption trends remain healthy – that each workload has the resources it needs to do its job, but without overindulging to the point that problems arise for individual applications or your cluster as a whole.

Yechezkel Rabinovich
Yechezkel Rabinovich
 
CTO

8 min read |
Published on: Jul 02, 2024

Latest posts

Explore related posts

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.