Minimalism. Yeah, big word I know, but bear with me here. Basically, it is all about owning only what adds value and meaning to your life, and removing the rest. It's about removing the clutter and using your time and energy for the things that remain, since we only have a certain amount of energy, time, and space in our lives.
The best software developers are minimalists in heart. Minimalism does not mean writing less code, but writing code that counts with elegant and tight structure, and does one thing well. Minimalism, in this sense, means designing systems that use the least hardware and software resources possible - because you can. Because that’s how you believe things should be.
Like many developers we implement this approach to every thing we design and build.
Our journey to building Murre starts when we wanted to expose K8s node resources and usage metrics for our customers. As a company building a comprehensive Kuberenetes application monitoring solution the need was clear. We wanted to monitor and export the resources that are part of the cluster’s infrastructure layer - CPU, memory, and disk usage on each node aggregated by each container.
Sounds simple right? Thing is, as a monitoring platform embedded in the heart of the Kubernetes cluster, we always bear the token of minimalism. We aim to be as lightweight as possible wherever we can - which also means we prefer to not install any 3rd party tools on the cluster, if we don’t absolutely have to.
So we’ve set course to get our hands on the K8s node metrics, BUT without having to install the famous Metrics Server.
The traditional way to K8s metrics monitoring
Very often we see customers struggling with monitoring the physical layer that has been abstracted by K8s (you know, the same old CPU, Memory, Disk, etc). This becomes especially important when you’ve got a noisy neighbors effect which means your pod might be using the expected resource allocation but still suffer from missing resources.
The Kubernetes ecosystem includes a few complementary add-ons for aggregating and exposing monitoring data from your Kubernetes cluster. The Metrics Server is one of these useful add-ons.
The Kubernetes Metrics Server is a cluster-wide aggregator of resource usage data. It collects metrics like CPU or memory consumption for containers or nodes, from the Summary API, exposed by Kubelet on each node. However, it is an add-on which means it’s not part of the out-of-the-box K8s and not deployed by default in standard managed Kubernetes platforms.
The Metrics Server is intended to be scalable and resource-efficient, requesting about 100m core CPU and 200 MiB of memory for a typical cluster of 100 nodes. It allows storing only near-real-time metrics in memory, supporting ad-hoc checks of CPU or memory usage, or for periodic querying by a monitoring service that retains data over longer timespans.
Though designed to scale well with your K8s cluster, it doesn’t come free of issues. Installing a 3rd party service running in your cluster means you have to maintain and troubleshoot it from time to time. It can crash, take up more resources than you intended, or fail to do its job. The Metrics Server known issues page is just a glimpse into what it means to take responsibility for another ever-running service in your already busy cluster.
And there’s one more point that concerned us. At groundcover we’ve built a distributed observability solution that can scale with the cluster and can take decisions at the edge. For example - we want to sample the HTTP span that caused a CPU spike in a relevant pod.
Going the Metrics Server route would (beyond the need to install and run it) also mean each of our edge units (running on each node) would have to query the central Metrics Server regularly to create such a trigger. That would force our distributed architecture to rely on a centralized point and would make things hard to scale.
Taking it to ultra-light
groundcover is all about offering a lightweight and frictionless approach to Kubernetes application monitoring, so we started looking for other ways to get the job done.
First, we started by measuring resources ourselves by operating Linux OS tools, like top, ps, etc directly on each node in the cluster. That solved one piece of the puzzle of enabling resource monitoring that didn’t require any prior installations or maintenance. However, it did introduce an efficiency issue. The outputs of tools like top and ps required parsing. They were also Linux and not K8s tools, so it required a second layer that made sense of process resources and turned into the containers resources our customers know and understand.
But, like any good answer to a problem, ours has been staring at us all along.
The Metrics Server is built in a centralized fashion, one per cluster. So how does it get all the metrics it needs from all the different nodes? A quick look at the code reveals the clear answer - by simply querying the Kubelet running on each node. Suddenly it all made sense, since K8s itself must also measure node resources as well for the Kubelet workflow to operate!
Kubelet who?
Kubelet is a process that runs on each node of a Kubernetes cluster and creates, destroys, or updates pods and their containers for the given node when instructed to do so. Basically, the Kubelet is the primary "node agent" that runs on each node and it works using PodSpecs (a YAML or JSON object that describes a pod). It is responsible for taking PodSpecs that are provided through various mechanisms (primarily through the apiserver) and ensures that the containers described in those PodSpecs are running and healthy.
In Kubernetes, scheduling, preemption and eviction are an important part of the cluster’s life. Scheduling refers to making sure that pods are matched to nodes so that the Kubelet can run them. Preemption is the process of terminating pods with lower priority so that pods with higher priority can schedule on nodes and eviction is the process of terminating one or more pods on nodes.
The Kubelet plays a critical role in these major scenarios. For example, a scenario called node-pressure eviction. Kubernetes constantly checks node resources, like disk pressure, CPU or Out of Memory (OOM). In case a resource (like CPU or memory) consumption in the node reaches a certain threshold, Kubelet will start evicting Pods in order to free up the resource.
This is exactly why the Kubelet must constantly use Kubernetes resource metrics to do its job, and expose these metrics to other services that might need them as well.
Great. So if the Kubelet is already exposing this API to the Metrics Server, it means we can use this API ourselves.
That’s two birds with one stone! We can use this API and query it directly without deploying a Metrics Server on the cluster, but we can also query it inside each node - without having to leave the node.
Oh, but it’s not documented anywhere…So we had to dig deeper.
A deep dive into the Kubelet source-code
We started to study the Kubelet sources to figure out how it attains the K8s resource metrics and how it exposes this data as APIs down the stream. What we found is that not only is Kubelet measuring the resource usage on each cluster node, but it also exposes the data using Prometheus formatted metrics - which we love!
The Kubelet API is not documented, but from its sources we found the endpoints. There are more endpoints that are not used for metrics / stats, but these are out of scope for this research. There’s definitely more gold there to be uncovered, but we’re going to focus on K8s metrics for now.
You can find some of the metrics definitions here pkg/kubelet/server/server.go:
Sign up for Updates
Keep up with all things cloud-native observability.