Kubernetes resource management is a little like children around candy: If you don't carefully limit how many resources a container can consume, it may go overboard – like a kid who consumes a family-sized bag of Skittles in one sitting.
On the other hand, depriving containers of access to the minimum volume of resources they need to run properly is akin to telling your kids they can never indulge in treats. They'll be miserable and underperforming (we're talking here about containers, not children, of course).
This is why setting appropriate Kubernetes requests and limits is so important. By understanding the role that requests and limits play in Kubernetes resource and performance management, and determining when and how to set requests and/or limits, you can ensure that each workload has the right amount of resources – no more and no fewer – to do its job.
Keep reading for a dive into how requests and limits work in Kubernetes, how they compare to each other, and how to use them to manage resources. Understanding Kubernetes resource management and its importance can be mission-critical to your organization. Let's begin by going over the basics of Kubernetes resource management.
The main purpose of Kubernetes is to orchestrate workloads across a cluster of servers. By default, Kubernetes has no way of knowing how many CPU, memory, or other resources each workload needs, nor does it try to control how many resources each workload can access. If there's available CPU or memory on the node that hosts a Pod, Kubernetes will go ahead and let the Pod consume that resource if it wants.
This approach to resource management is fine as long as there's enough CPU and memory available to keep all Pods operating normally. But if one Pod or container tries to hog resources, or if total resource availability decreases due to issues like nodes being removed from a cluster, problems can ensue. Workloads may begin experiencing errors or taking a long time to handle requests because they don't have enough CPU, leading to CPU throttling. Likewise, lack of sufficient memory could lead to issues like workloads being OOMKilled. Kubernetes may also try to migrate Pods repeatedly from one node to another as nodes run out of spare resources, which could lead to disruptions in application availability.
Fortunately, Kubernetes offers features – requests and limits – to help admins get ahead of issues like these and ensure that resources are distributed in an optimal way across a cluster.
What is a Kubernetes request?
A Kubernetes request is a minimal amount of resources that Kubernetes makes available to a specific container.
For example, if you set a memory request of 128 megabytes and a CPU request of 750 millicores (more on what a millicore means in a moment) for a container, Kubernetes would ensure that the container always has at least this many CPU and memory resources available to it.
The container can consume more resources, too – up to any limits you establish, as explained below. A memory or CPU request simply sets a minimum level of resource availability.
You define requests within the resources section of a container's spec. For example:
This requests 128 megabytes of memory and 750 millicores of CPU.
Request units: What's a CPU millicore?
Measuring memory requests based on megabytes is simple enough, but the concept of millicores may be less familiar.
In Kubernetes, a CPU millicore is one-thousandth of what Kubernetes calls a CPU unit. A CPU unit is one physical or virtual CPU.
The exact amount of computing power that these units represent can vary because different CPUs have different levels of capacity – so 1000 CPU millicores on one node might provide more or less processing power than 1000 millicores on a different node, unless each node has identical CPUs. Still, millicores provide an approximate way to compare and assign CPU capacity across workloads.
How CPU and memory requests influence Pod placement: The art of scheduling
When Kubernetes schedules Pods (meaning it decides which node should host a Pod), it factors in requests defined for the Pod. It won't assign a Pod to a node if the node lacks enough spare resources to meet the Pod's requests.
If no Pods are available that meet the requests, Kubernetes won't schedule the Pod at all. The Pod will be stuck in the Pending state until a node with enough resources opens up (or until you change the resource requests for the Pod).
This means that there's something of an art to balancing requests with Pod scheduling. You want to avoid situations where you set requests that are so high that your Pods either can't be scheduled at all, or that they consume available nodes inefficiently.
For instance, imagine that you have 10 nodes with 1 CPU unit each, and 10 Pods that you want to deploy across your nodes. If you were to set a request of 550 CPU millicores for each Pod, no node would be able to host more than one Pod at a time, and you'd likely underutilize your nodes because some nodes might sit with nearly half of their resources unallocated, assuming the Pods don't consume significantly more resources than those assigned to them. (This is a simplistic scenario because in the real world, you probably wouldn't have just one CPU per node, but you get the point.)
Practical considerations for setting effective requests
To avoid problematic scheduling scenarios, Kubernetes admins should set effective requests based on practices like the following:
- Base requests on actual resource usage: Instead of merely guessing how many resources your workload will need, use Kubernetes monitoring tools to track its resource usage in an actual environment. Even if you only monitor in a dev/test environment, collecting Kubernetes metrics such as CPU and memory consumption will provide valuable insight into how many resources the workload uses when it operates – which may be different from what you expect it to use.
- Consider each workload's purpose when setting requests: Critical workloads (like production apps) may need more aggressive requests than less important workloads.
- Align requests with node resource availability: Consider how many resources each node has and how requests will impact the number of Pods that each node can host. Again, steer clear of scenarios where nodes are underutilized because requests make it impossible to schedule Pods across them efficiently.
- Consider varying CPU capacity: Since the meaning of 1 CPU unit can vary across nodes, factor in the actual CPU capacity of each of your nodes when setting requests. Remember, too, that you can use DaemonSets if you want to force a Pod to run a specific node based on the type of CPU available on the node.
What is a Kubernetes limit?
Now that we know how requests work, let's move on to their alter-ego: Kubernetes limits.
In Kubernetes, a limit is the maximum amount of resources of a given type that a container can consume. For example, if you set a CPU limit of 1500 CPU millicores and a memory limit of 1024 megabytes for a container, Kubernetes will not allow the container to consume more resources than these.
Like requests, resources are defined in a container's spec. For example:
How memory and CPU limits prevent resource overconsumption and maintain a stable cluster
Memory and CPU limits are important because they help prevent a container from consuming so many resources that there aren't enough to support other containers. As we mentioned above, this can lead to situations where applications fail or underperform due to a lack of adequate resources. It can also trigger undesired rescheduling of Pods that share a node with a resource-hungry container.
Of course, you don't want to set memory and CPU limits that are so low that they deprive a container of the resources it needs to operate normally. Your goal should be to strike a healthy balance that gives a container access to a reasonable amount of resources, while still leaving sufficient resources available to other workloads.
Setting smart limits: Strategies for optimizing resource utilization
The following strategies can help to set appropriate limits:
- Decide which workloads to prioritize: As with memory and CPU requests, some workloads may require different limits than others, based on how important they are. It makes more sense to set high limits for mission-critical workloads, as compared to ones whose failure you can tolerate.
- Consider cluster scalability: If your cluster has autoscaling features enabled that allow it to add nodes quickly when total resource availability becomes constrained, there is less risk associated with setting high limits than there would be if your cluster's total resource capacity is fixed and unscalable.
- Assign limit ranges: Limit ranges are a Kubernetes feature that can restrict resource usage levels on a per-namespace (as opposed to per-container) basis. Limit ranges can help provide a safeguard against high limits for individual containers, since the containers won't be allowed to consume more resources than the CPU or memory limit range for their namespace, even if the total container resource limits are higher.
Kubernetes resource requests vs. limits: Key differences
Kubernetes memory and CPU requests and limits are similar in that both can assist with the efficient management of resources. However the obvious difference between them is that resource requests establish the minimum resources available to an application, while limits control the maximum.
It's important to note, too, that you don't have to use resource requests and limits together. You can use them independently by, for example, defining a CPU or memory request for a container but no limit. Doing so would make sense in scenarios where you want to ensure that a mission-critical workload always has a minimum amount of resources available, but you are not concerned about it being over-consuming.
Why both are essential: A harmonious balance
That said, it's generally best to set both requests and limits. Even if you don't expect a workload to consume an unreasonable amount of resources, issues like memory leaks or code lead to unanticipated resource consumption spikes. Limits offer a backstop that can prevent cluster performance or stability issues.
Likewise, setting requests is a smart move in most cases because it ensures that your containers will never have insufficient resources to guarantee a minimum level of performance. Problems like failed nodes may unexpectedly decrease the resources available to containers, and requests can help ensure that containers are rescheduled in the most efficient way.
Common issues caused by request and limit misconfigurations
A variety of issues can arise if you misconfigure requests and limits in Kubernetes. The following table discusses the most common.
Many of these issues can stem from other root causes, too, not just improper CPU or memory requests and limit settings. For example, there are a variety of reasons why a node might become unresponsive, such as network connectivity problems or failures within the node's operating system. But in many cases, even if the root cause of a problem is not CPU or memory requests or limit settings, you can use requests and limits to help mitigate the effects of the issue.
Best practices for setting Kubernetes requests and limits
The following best practices for working with requests and limits in Kubernetes can help avoid issues like those described above:
- Align settings with workload priority levels: As we mentioned, it typically makes sense to allocate more resources to high-importance workloads, while being more conservative for lower-priority Pods.
- Review and update requests and limits: The application needs to evolve over time in response to changes like fluctuations in traffic. The requests and limits you initially set for an app may not be appropriate in the future. Periodically review actual resource consumption data and update settings as needed.
- Manage limits at both the container and namespace level: Again, in addition to setting limits on a per-container basis, you can use memory and CPU limit ranges to manage resource consumption within a namespace. Using both features is a best practice because it helps establish multiple layers of protection against excess resource consumption.
- Use auto-scaling strategically: Autoscaling features (which are available in certain Kubernetes distributions and services) can automatically add nodes to clusters, thereby increasing resource availability. Autoscaling helps protect against failures due to resource overconsumption, so it makes sense to take advantage of it for critical workloads. However, because adding nodes increases the cost of operating Kubernetes, you shouldn't use auto-scaling as a substitute for controlling resource consumption via effective limits.
Getting more from Kubernetes with requests and limits
You don't have to configure requests and limits if you use Kubernetes, just as you don't have to let your kids indulge in the occasional candy bar. But requests and limits are handy features for ensuring that resource consumption trends remain healthy – that each workload has the resources it needs to do its job, but without overindulging to the point that problems arise for individual applications or your cluster as a whole.
Sign up for Updates
Keep up with all things cloud-native observability.