Kubernetes nodes – meaning the servers that provide the core infrastructure for hosting Kubernetes clusters – are the backbone of any Kubernetes cluster. For that reason, configuring, securing, and optimizing nodes is a critical step toward maximizing workload performance and reliability.
To that end, here's a comprehensive guide to Kubernetes node management. This article explains how nodes work in Kubernetes, how they relate to other resources (like pods), how to configure and monitor nodes, best practices for node optimization, and more.
What is a Kubernetes node?
In Kubernetes, nodes are the servers that you use to build a Kubernetes cluster. There are two types of nodes:
- Control plane nodes, which host the Kubernetes control plane software, including the API server.
- Worker nodes, which host applications that you deploy on Kubernetes.
Note that nodes can serve as control plane nodes and worker nodes at the same time if they host both the control plane and applications. This type of node topology is common in single-node clusters, but it's rare in a production environment, where you typically want to segment worker nodes from the control plane because you don't want a buggy application to impact the API server or other control plane components.
Nodes can be either physical or virtual machines, and they can run any Linux-based operating system. You can also use Windows machines as worker nodes – in fact, you need to if you’re going to host Windows containers in Kubernetes – although Windows doesn't support control plane nodes.
This flexibility is part of what makes nodes in Kubernetes so powerful. You can spin up virtually any type of server to function as a node. The server's underlying configuration doesn't really matter; all that matters is that your node can join the cluster and is capable of hosting the core node components, like Kubelet, which we'll discuss later in this article.
Kubernetes nodes vs. pods
Nodes and pods are both important parts of a Kubernetes environment, but they're not the same thing.
A node, as we mentioned above, is a server that is joined to a Kubernetes cluster. It hosts either the control plane software that powers the API server and manages the cluster, or it hosts applications.
In contrast, a pod is one or more containers that power applications. In other words, pods are the actual workloads.
This means that pods run on top of nodes (specifically, worker nodes). You need to set up nodes before you can deploy pods.
The number of pods running on a node can vary. Some nodes might host just one pod, while others host significantly more, depending on how many CPU and memory resources the pods require and how many the node can offer.
Why are nodes important in a Kubernetes cluster?
The importance of nodes to Kubernetes is straightforward: Your cluster can't exist without nodes, because nodes are literally the ingredients out of which your cluster is constructed. The stronger and bigger your nodes are, the more powerful your cluster will be, in terms of total CPU, memory, and other resources.
Going further, it's worth noting as well that part of the power of Kubernetes derives from the fact that Kubernetes can deploy workloads across multiple node objects. That makes it possible to run applications in a distributed, scale-out environment where the failure of a single server is typically not a big deal. Without nodes, the whole concept of distributed, cloud native computing in Kubernetes wouldn't work.
For clarity's sake, we should mention that it's possible to have a Kubernetes cluster that consists of just a single node (which would function as both a control plane node and a worker node in that case). But since this setup would deprive you of the important benefits of being able to deploy workloads in a distributed environment, it's basically unheard of in production to run a single-node cluster. Single-node clusters can come in handy for testing purposes, but usually not for deploying applications in the real world.
How many nodes should you have in a cluster?
There are no universal rules for determining how many nodes to include in a Kubernetes cluster, but there are several factors to consider. In general, your goal when setting up nodes should be to ensure that you have enough nodes to meet performance and availability goals. At the same time, you don't want to deploy more nodes than necessary, since this leads to wasted money.
Here are the three main factors to weigh when deciding how many node objects to join to your cluster (a node object refers to an individual node, which Kubernetes treats as an "object" – meaning a resource it manages).
1. Performance
By and large, more nodes translate to better application performance. This is true for two reasons.
First, having more nodes means that you'll have more CPU and memory resources for applications to use (keep in mind, however, that node count alone doesn't necessarily correlate with CPU and memory availability because one node can have much more CPU and memory than another). A rule of thumb is to deploy enough nodes that no more than 70 percent of their total CPU and memory resources are being used at a given time. This provides a reasonable buffer in case resource requirements spike due to increased application traffic.
Second, having more nodes makes it easier to shift pods between nodes as a way of improving performance. For example, if a pod on one node becomes a "noisy neighbor" that is sucking up a lot of CPU and memory, you could migrate other pods on the node to different nodes, where their neighbors will not be so noisy – assuming you have enough nodes with spare resources available to support the relocated pods.
2. High availability
More nodes also translate to higher availability. The reason why is simple: The more nodes you have, the more you can afford to have fail without disrupting the availability of your applications.
It's important to keep in mind, however, that simply maximizing node count won't necessarily maximize availability. To do that, you'll want to ensure that you have multiple control plane nodes, which keep the API server and other control plane components running even if one of the nodes in this group fails. In addition, consider using features like DaemonSets to run replicas of pods on multiple nodes. This keeps applications available in the event that one of the nodes hosting an app were to fail.
3. Physical machines vs. VMs
The ideal number of nodes for your cluster can vary depending on whether some or all of the nodes are physical machines rather than virtual machines (VMs).
If nodes are physical machines, they are likely to include more CPU and memory resources (because they're not dividing their resources among multiple VMs). Thus, clusters that consist of physical machines generally require a lower node count.
That said, a downside of creating clusters using only physical machines is that you're likely to end up with a greater number of pods residing on each node (since you have fewer total nodes). This can be problematic if the node fails due to issues like a bug in its operating system. Using VMs instead of physical machines helps to spread out this risk.
Keep in mind that it's possible to have a node topology where some nodes are physical machines and others are VMs. You can create a cluster with both types of nodes running simultaneously.
Kubernetes node components
Each worker node in Kubernetes is responsible for running several components. Let's look at them one by one.
Kubelet: The node agent
Kubelet is the envoy that allows worker nodes to talk to control plane nodes. In other words, Kubelet is the software that runs locally on each node and serves as an envoy between the node and the rest of the cluster.
Kubelet is responsible for executing whichever workloads the control plane tells it to. It also tracks workload operations and reports on their status back to the control plane.
Container runtime
In order to run workloads – which are packaged as containers in most cases, unless you're doing something less orthodox like using KubeVirt to run VMs on top of Kubernetes – you need a container runtime. A container runtime is the software that actually executes containers.
Examples of popular container runtimes include containerd and Docker Engine (although the latter is now deprecated). The runtime you choose doesn't really matter in most cases as long as it's compatible with your containers – which it probably is, because all mainstream runtimes comply with the Container Runtime Interface (CRI) standards, which Kubernetes requires as of release 1.27.
So, while there's a lot to say about the differences between the various runtimes, we're not going to go down that rabbit hole in this article. We'll just say that you should choose a CRI-compliant runtime and move on with your life.
Kube-proxy: Network management
Kube-proxy maintains network proxy rules for your nodes. These rules allow Kubernetes to redirect traffic flowing to and from Kubernetes services that operate in the cluster.
It's possible in certain environments to use Kubernetes without kube-proxy, which can help optimize performance in some cases. But unless you're worried about eliminating every single unnecessary CPU cycle, you should just stick with kube-proxy, which is the simplest and time-tested way of managing network proxies.
Node management 101: Understanding node status and conditions
Now that you know what Kubernetes nodes do and why they matter, let's talk about how to manage them.
The first thing to know about managing nodes is that you can check their status using the kubectl describe node command:
kubectl describe nodes node-name
The output includes basic information about the state of your node, including:
- Addresses: Includes data about the node's network status, including IP addresses and hostname.
- Conditions: Describes the node's current state with optional additional information (which we'll cover in a bit more detail later).
- Capacity and Allocatable: Contains information about the CPU and memory resources available to the node.
- Additional information: You'll also usually see the node name, node operating system information, and data about the node's kubelet instance.
For full details on node status reporting, check out the NodeStatus part of the Kubernetes documentation.
Assigning pods to nodes: Node selector vs. node affinity
By default, Kubernetes will automatically decide which pods should run on which nodes based on how many spare resources are available on various nodes. But it also allows you to assign pods to specific nodes or types of nodes if you wish.
There are two main ways to assign pods to nodes;
- Node selector: You assign a label to nodes. When you create a pod that you want to run on a specific node, you configure the node selector field to match the same node name as the label of the node that should host the pod. This is the simplest way to assign pods to nodes, but it is not very flexible because it only allows you to select nodes with specific labels. If no nodes match the nodeSelector you specify for a pod, Kubernetes won't schedule the pod at all, which means the application won't run.
- Affinity: When creating pods, you define node characteristics that have to be met or are preferred when scheduling a pod. This allows you to set "hard" conditions for assigning pods to nodes, while also providing the flexibility to configure additional conditions that Kubernetes should meet if possible, but ignore if not. Affinity matching helps avoid situations where a pod is never scheduled.
In general, affinity-based assignments are better because they offer more control and flexibility. For instance, you might use affinity if a pod should ideally be hosted by a node in a certain region, but you want the pod to run elsewhere if no node in that region is available.
The advantage of node selectors is that they are simpler to configure, and they're fine to use if the conditions you need to match are straightforward. For example, if you have an application that has to run on a certain version of Linux, you could use the nodeSelector field to match nodes that use that version.
Common node conditions
The information that you'll find in the node Conditions field describes the various states that your node might be in. Possible states include:
- Ready: The node is healthy, with no problems detected.
- DiskPressure: The node lacks sufficient storage resources.
- MemoryPressure: The node lacks sufficient memory resources.
- PIDPressure: There are too many processes on the node.
- NetworkUnavailable: The node is experiencing network issues.
Ideally, your nodes will always be in the Ready state. Other states don’t necessarily mean the node has failed or is about to fail, but they do indicate some type of problem that will eventually lead to failure if you don't manage it.
Troubleshooting Kubernetes node errors
Lots of things can go wrong with nodes. Here's a look at how to troubleshoot common node problems.
1. Check node status
If you run into performance issues on a Kubernetes node, your first step should be to check the node's status. A Condition field that indicates it's not in the Ready state is usually your best indication of what's wrong, since the state will tell you if the node is short on, say, memory or disk resources.
2. Look for anomalies in node status
If the Conditions field doesn't point you in a useful direction, look for other anomalies in the node status details. For example, check the addresses to make sure the node's network settings are properly configured.
3. Check kubelet logs
If you're still unsure what's wrong, your best bet is to look at the Kubelet logs for the node. The Kubelet logs location varies depending on the node operating system and how you configured Kubernetes, but you can more often than not find them in the /var/log directory of the node.
Kubernetes node error troubleshooting example: Node not ready
As an example of troubleshooting node problems, let's take a look at a common node failure scenario: A node that is stuck in the NotReady state for more than a couple of minutes
You can confirm that a node is not ready by running kubectl get nodes. Normally, all nodes should be ready within a minute or two of joining a cluster. A node may sometimes become unready in the event that it reboots, but typically, that status only lasts a short time.
If a node is stuck in NotReady status for more than a short time, there are a few likely causes:
- The node lost its network connection to the control plane. Check network settings to make sure nothing changed that would have caused the node to disconnect.
- The node crashed and needs to be rebooted manually. Attempt to reboot the node in this case.
- The node crashed and keeps crashing on reboot attempts due to hardware-related node problems or a failure in the node's operating system. In this case, your best bet is to delete the node object from your cluster and (if necessary to supply adequate CPU and memory) replace it with a new node.
- The kubelet instance on the node is experiencing an error. Running kubeadm reset on the node, then rejoining the node object to the cluster, may help. This cleans up the local kubelet environment.
Working with Kubernetes nodes
Here's an overview of other common tasks you might want to perform with your Kubernetes nodes.
Adding and removing nodes from your cluster
Depending on your Kubernetes distribution and configuration, there may be multiple ways to join a node to a cluster or remove it. The most common method – and the one that should work on any Kubernetes environment – is to use kubeadm to set up a node, then join it to a cluster with:
To remove a node, first "drain" it (which tells Kubernetes to migrate workloads hosted on the node to other nodes) with:
Then, remove it from the cluster with the delete command:
If you use a managed Kubernetes solution, such as EKS or GKE, your Kubernetes provider may also offer a graphical interface and/or custom tools (like eksctl) to add and remove nodes.
Tainting and untainting nodes
Taints are properties that make it possible to avoid scheduling specific pods on a node. In other words, when you "taint" a node, you can tell Kubernetes to treat it in a particular way.
To taint a node, use:
The key-value is the information that tells Kubernetes how to treat the tainted node.
To remove a taint, run the same command again, but add a - character following it:
Organizing nodes with labels and selectors
Like taints, labels are key-value pairs that are properties of a node. Unlike taints, labels don't have a direct impact on node scheduling. Instead, they allow you to organize nodes, kind of like tagging resources in a public cloud. You can also use labels in conjunction with Selectors to select which nodes Kubernetes should prefer for scheduling.
Labels can be added using the kubectl label command, which should specify the node’s name and the label key-value you want to use.
Selectors can be defined via the “nodeSelector” attribute for individual workloads.
Monitoring and scaling Kubernetes nodes
Kubernetes nodes should be monitored regularly if you want to prevent problems. The earlier you detect issues on your nodes, the better positioned you are to get ahead of them before they degrade application performance.
Kubernetes monitoring tools
There are a plethora of Kubernetes monitoring tools out in the world, and we won't try to describe them all here. We will say that many will find the following solutions helpful for meeting basic Kubernetes monitoring needs:
- The Node Metrics API: A native Kubernetes API that provides high-level node metrics, including CPU and memory usage, network usage, and file system usage. You can use these metrics to monitor the overall health of Kubernetes nodes.
- Prometheus: An open source monitoring system that can be used to monitor Kubernetes nodes. Prometheus provides a wide range of metrics that can be used to monitor node performance, including CPU usage, memory usage, network traffic, and disk usage.
- Grafana: An open source analytics and monitoring platform that can be used to visualize Prometheus metrics. Grafana provides a wide range of pre-built dashboards that can be used to monitor Kubernetes nodes.
All of these solutions are free, and all provide a simple means of collecting the core monitoring data you need to track the health and status of Kubernetes nodes.
Key node metrics to monitor
Which metrics matter most? If we were to make a list, it would include:
- Node CPU usage.
- Available disk space.
- Available memory.
- Network usage statistics.
If something's wrong with your node, there's a pretty good chance that anomalies in at least one of these metrics will help you pinpoint the cause quickly.
Node auto-scaling: Horizontal and vertical scaling strategies
One of the cool features of Kubernetes is the Horizontal Pod Autoscaler (HPA), which automatically scales the number of pods in a Kubernetes deployment based on CPU or memory usage. HPA can be used to auto-scale Kubernetes nodes based on resource usage and avoid situations where nodes run out of sufficient resources for the pods assigned to them.
We should note that horizontal scaling differs from vertical scaling in that its strategy is based on replicating the workloads to more instances, instead of supplying the existing workloads with more resources.
Securing Kubernetes nodes
Just as an underperforming node can become the weakest link in your Kubernetes performance strategy, an insecure node can quickly turn into an open door for attackers to compromise your cluster.
Best practices for node security include:
- Regularly updating the nodes: Keep the nodes up-to-date with the latest security patches to prevent any vulnerabilities from being exploited.
- Limiting the use of privileged containers: Avoid running containers in privileged mode, which increases the risk that they can bypass security measures to gain access to the host operating system.
- Limiting node access: Restrict access to the nodes to authorized users and services, rather than letting anyone log in or access node resources.
- Using secure communication channels: Use secure communication channels such as SSH to access the nodes. Avoid using unencrypted methods like telnet.
- Using container images from trusted sources: Avoid using container images from untrusted sources, since they may contain malicious code that attackers could use to plant malware on a node and build a backdoor into it for themselves.
You should also follow all standard server security best practices. Avoid unnecessary user accounts on your nodes, uninstall unnecessary software (which, in addition to wasting resources, increases the attack surface of your nodes), and consider deploying kernel-hardening frameworks like SELinux or AppArmor to add an extra layer of protection against attacks.
Implementing network policies
Network Policies is a Kubernetes feature that restricts network traffic to and from Kubernetes nodes. Network Policies are another way to help prevent unauthorized access to Kubernetes nodes.
Role-Based Access Control (RBAC)
RBAC is a Kubernetes feature that restricts access to Kubernetes resources based on user roles.
RBAC isn't just a node security feature; RBAC can help protect various other components of your cluster. But because you can use RBAC to ensure that only authorized users have access to Kubernetes nodes, it's one useful tool for protecting your infrastructure.
How to optimize Kubernetes node performance
Since healthy nodes make for a healthy, high-performing Kubernetes cluster, optimizing node performance is an important step toward achieving the best tradeoff between performance and cost.
To that end, consider the following node performance optimization strategies:
- Provision nodes with minimalist operating systems: Extraneous libraries and services waste resources without adding value.
- Consider autoscaling: As mentioned above, node autoscaling can automatically adjust pods and node configurations to achieve the ideal balance between performance and node capacity. This helps optimize performance without overpaying.
- Monitor logs and metrics from all the nodes: Logs and metrics provide insight into node health, allowing you to get ahead of issues that could impede node performance.
- Consider bare-metal nodes: Nodes that are physical machines instead of VMs have some drawbacks, as mentioned above. However, they generally perform better because there is no virtualization layer sucking up additional resources.
- Configure requests and limits: Effective resource requests and limits help ensure optimal consumption of node resources by pods.
No node left behind!
Nodes may feel like the most boring part of your Kubernetes clusters. They are the things that sit in the background and host your workloads, and until something goes wrong, you probably don't think much about them.
But when one of your nodes does break, suffers performance degradation, or experiences a security breach, your entire cluster can quickly fall apart if you don't manage the issue effectively. That's why it's essential to understand how nodes work in Kubernetes, manage their organization, monitor them continuously for problems, and secure them.
After all, your containerized applications will only work as well as the nodes that host them. Your goal should be to ensure that you let no nodes be left behind in your quest to optimize Kubernetes performance and security.
Sign up for Updates
Keep up with all things cloud-native observability.