If you like the Microsoft Azure cloud, and the same goes for Kubernetes, you'll probably be interested in hearing about Azure Kubernetes Service (AKS) – Microsoft's cloud-based, managed Kubernetes solution.

But what you might not like – at least at first – is the challenge of figuring out how to monitor AKS. While Microsoft provides some basic AKS monitoring tooling, it's not enough on its own to cover complex AKS performance management, cost management, and security needs.

The good news is that, with the right strategy and tools on your side, you can conquer AKS monitoring challenges. This article explains how to walk through the fundamentals of AKS monitoring, including how it works, why it's important, which types of metrics you can collect on AKS, and best practices for getting the most out of AKS monitoring tools and processes.

What is AKS?

Azure Kubernetes Service (AKS) is a container orchestration platform that is available as a managed service in the Microsoft Azure cloud. This means that you can deploy AKS without having to provision the underlying infrastructure or install Kubernetes on it yourself.

In addition, AKS can automate many of the tasks associated with managing a Kubernetes cluster, such as performing cluster upgrades and adding nodes to clusters to keep up with workload demand via autoscaling.

AKS is not the only way to use Kubernetes on the Azure cloud. You can also set up virtual machines using the Azure Virtual Machines service, and then provision them manually as Kubernetes nodes. However, AKS offers a simpler, more scalable way to deploy Kubernetes.

What is AKS monitoring?

Azure Kubernetes Service (AKS) monitoring is the process of collecting and interpreting data about the performance, cost-effectiveness, and security of AKS environments.

By monitoring AKS, you can identify challenges such as container failures, network connectivity issues, and overprovisioned Azure resources. When done well, AKS monitoring allows you to detect problems early on and respond to them proactively, before they turn into major failures.

The importance of AKS monitoring

If you're familiar with the concepts of monitoring and observability in general, you probably realize that AKS monitoring is important because it helps you manage workloads and their host environment effectively.

What those who are new to AKS don't always realize, however, is that although AKS is a managed service, it doesn't automatically alert you about or remediate most performance, cost-management, and security issues. It expects you to detect and solve those problems yourself.

For instance, if a Pod is struggling to perform adequately because of poorly configured Kubernetes limits, AKS is not going to tell you about the issue. Likewise, if you're paying for more node resources than your workloads need because you configured container requests that are too high, AKS won't alert you that you've overspent. It's on you to detect these issues via AKS monitoring.

Azure Kubernetes Service (AKS) challenges

To offer more context on why AKS monitoring is important, let's look at some of the main challenges that arise when you use AKS (or, for that matter, any complex, Kubernetes-based service), which proper monitoring can help to manage.

Complexity

Like any Kubernetes service or distribution, AKS is a complex system that involves many moving parts – infrastructure, various control plane components, worker nodes, individual workloads, and more. Keeping track of the status of all of those components, as well as interactions between them, can be challenging – and again, AKS doesn't automatically manage this task for you.

Cost management

The major cost of using AKS stems from the underlying cloud infrastructure that your clusters consume. The more nodes you deploy, and the more CPU and memory you have allocated through those nodes, the higher your cloud bill will be.

Thus, if you want to keep AKS spending in check, you must ensure that your clusters include adequate resources to support your workloads, without operating more nodes (or nodes with higher resource allocations) than you need.

Here again, AKS doesn't automatically manage costs for you or tell you when you're overspending. You need to monitor your clusters and workloads yourself to keep costs in check.

Observability

Azure offers tooling to help with AKS monitoring and observability, as we explain below. But by default, the tooling is not turned out, and AKS itself doesn't provide much in the way of tools for tracking and analyzing AKS performance. It leaves it up to the user to implement a monitoring strategy – whether that's using the built-in Azure monitoring tools, a third-party observability solution, or both.

Interactions between AKS and other Azure services

AKS is tightly integrated with other Azure resources and services – such as Azure Virtual Machines, which provide the underlying infrastructure to support AKS clusters. This integration is part of what makes AKS easy to use because it essentially takes other Azure services, runs Kubernetes on top of them, and makes them available to customers in a way that requires little setup effort on the customer's part.

However, tight integration with other Azure resources also means that if something goes wrong in an AKS environment, it's not always clear whether the issue is the AKS service or another Azure service that AKS depends on. For example, if you experience issues connecting to your cluster, it could be because the cluster control plane is down or misconfigured. But it could also be network connectivity issues that are affecting Azure as a whole, not just AKS. Effective AKS monitoring helps you determine the difference.

Key AKS monitoring metric types and concepts

| Metric category | Purpose | |-------------------------------------------|-----------------------------------------------------------| | Platform metrics | Track AKS cluster and infrastructure resources. | | Activity logs | Monitor changes to AKS configuration. | | Resource logs | Monitor the state and health of control plane resources. | | Container insights and Prometheus metrics | Monitor the status of containers. |

Now that we've discussed what AKS monitoring means and why it's important, let's talk about how you actually do it – starting with the key types of metrics available to monitor in AKS.

At a high level, AKS metrics fall into the following five categories.

Platform metrics

Platform metrics in AKS provide insight into the scale and scope of your overall AKS environment. For example, platform metrics include data points like the total number of clusters and virtual machines that exist in your environment.

Activity logs

Activity logs record events that involve changes to AKS configuration. They're similar in some respects to Kubernetes audit logs, although activity logs focus on configuration changes at the AKS service level, whereas Kubernetes auditing is designed primarily for providing insight into internal cluster events.

Resource logs

Resource logs report information about the status and resource usage of control plane components, like the Kubernetes API server and the Scheduler. They also report some data related to the usage of external Azure resources, such as Azure Disk storage volumes, by AKS.

Resource logs are a key source of insight for detecting control plane failures and performance issues. In addition, resource logs may help determine whether any workloads are starved of adequate resources, as well as whether you're wasting money by allocating resources that workloads are not actually using.

Container insights and Prometheus metrics

Container insights are Kubernetes metrics related to container performance, such as failed liveness probes. Container insights are designed to be exported to Azure Monitor, Azure's main built-in monitoring tool.

As an alternative, you can configure AKS to export container-related metrics to Prometheus. This approach gives you access to most of the same types of data as Container metrics, but with a bit of added flexibility because you get greater ability to customize Prometheus and integrate it with other open source monitoring tools.

Native AKS monitoring tools

To help customers get started with AKS monitoring, Azure offers several "native" monitoring tools and services that support AKS. (By "native" we mean that these solutions are built into Azure and available by default, as opposed to tools you have to install manually.)

Azure Monitor

Azure Monitor is the primary native monitoring service for Azure. It allows you to collect and explore a variety of logs and metrics related to AKS and most other services hosted on Azure. You can also configure alerts to tell the Monitor to notify you about various events of anomalies within monitoring data.

Managed Prometheus with Azure Monitor

In addition to using Azure's Monitor tool in the "default" mode, you can collect and analyze AKS monitoring data using a managed Prometheus service that is built into Azure Monitor. This option provides more flexibility than traditional Azure Monitor, and you'll probably like it if you prefer working with Prometheus and other popular open source monitoring or data visualization tools, like Grafana.

Microsoft Defender for Containers

Microsoft Defender for Containers (which replaces a deprecated service called Microsoft Defender for Kubernetes) provides security monitoring and alerting for AKS environments. It can notify you about issues like misconfigurations that might expose your AKS workloads to attack and anomalous behavior that could be a sign of someone doing something nasty.

Getting started with monitoring your Azure Kubernetes cluster

By default, AKS doesn't automatically integrate with the monitoring tools described above. If you want to use them as the basis for AKS monitoring, you have to enable each integration explicitly.

Enable Container insights

To turn on Container insights, enable the --addon monitoring option via a command like:

az aks enable-addons --addon monitoring --name "my-cluster" --resource-group "my-resource-group" --workspace-resource-id "/subscriptions/my-subscription/resourceGroups/my-resource-group/providers/Microsoft.OperationalInsights/workspaces/my-workspace"

Enable Prometheus-based monitoring

To send AKS metrics to Azure's Prometheus-based managed service, use the --enable-azure-monitor-metrics flag with the az command. For example:

az aks create/update --enable-azure-monitor-metrics --name <cluster-name> --resource-group <cluster-resource-group>

Configure AKS insights alerts

Once you've enabled integration between your AKS environment and Monitor (either via Container insights or Prometheus), you can enable alerts by selecting Alerts from the left-hand pane in the AKS console.

There, you can create new alerts from scratch by clicking the Create button. You can also turn on automatically generated alerts by clicking Set up recommendations and choosing from the available options.

Enable Microsoft Defender for Containers

To connect Microsoft Defender for Containers to your AKS environment, sign into the Azure portal and search for Microsoft Defender for Cloud. Then click Environment settings, choose your Azure subscription and toggle the Containers plan to On.

Alerts are turned on automatically after you set up Defender. For details on working with Kubernetes alerts, refer to the Defender documentation.

AKS monitoring best practices

To get the most value out of AKS, consider the following best practices when monitoring AKS clusters and workloads.

Consider multiple monitoring tools and services

There are multiple tools and services available for monitoring AKS. They include solutions that are built into Azure, like Azure Monitor, as well as third-party tools, like groundcover.

There's no reason why you have to settle for just one solution. To get the greatest degree of flexibility, consider deploying multiple monitoring services. For example, Azure Monitor may be useful for baseline AKS alerting, while a third-party observability tool that offers more granular visibility and customizability is likely to come in handy when you need to dig deeper into complex AKS performance issues.

Just keep in mind that most AKS monitoring services (including Monitor and Microsoft Defender) cost money, so be sure you're actually using each tool you're paying for. Don't do things like enable Monitor just because it's easy to do if you don't plan on using Monitor for analytics or alerting.

Delineate performance, cost, and security metrics

AKS monitoring can serve three main goals – optimizing performance, optimizing cost, and mitigating security issues. Because each goal involves different types of data, it makes sense to devise a monitoring strategy that distinguishes between each type of monitoring and implements the proper data types and monitoring tools for it.

For example, you might decide to rely on Microsoft Defender and its built-in metrics and alerts for AKS security monitoring needs. But AKS performance monitoring is a separate affair, so you may choose to rely on a different solution for that purpose.

Monitor Azure as a whole alongside AKS

To differentiate between problems that affect the Azure cloud as a whole and problems that are specific to your AKS clusters, you should monitor Azure in general in addition to monitoring AKS. This is the only way to determine whether, for example, a networking issue stems from a misconfiguration within Kubernetes, or a networking failure that impacts an entire Azure region.

Customize alerts

If you use Azure's native monitoring tools to monitor AKS, don't restrict yourself to the alerts that the tools recommend by default. These auto-generated alerts are a good starting point. But Azure Monitor doesn't know your unique workload requirements or business priorities, so be prepared to customize the alerts based on your agenda.

Kubernetes monitoring with groundcover

No matter where you run Kubernetes – whether in the Azure cloud via AKS, or anywhere else – groundcover's Kubernetes monitoring and analytics capabilities help you get to the root of even the most complex Kubernetes performance and cost management challenges. Our state-of-the-art eBPF-based approach to monitoring means groundcover collects data hyper-efficiently. You also get full control over how you analyze, visualize and alert on monitoring data.

We're not here to knock solutions like Azure Monitor – which, again, are a great starting point for AKS monitoring. We also love Prometheus and Grafana just as much as Microsoft, and we see the value of having a Prometheus-based option for AKS monitoring in Azure Monitor.

But if you want a deeper level of visibility and control than you'll get from Azure's native monitoring services, consider groundcover. With groundcover, the sky's the limit when it comes to the types of metrics and logs you collect and what you can do with them. Plus, eBPF-based data collection can help minimize what you pay in observability costs and may result in a more cost-effective AKS monitoring solution than services like Azure Monitor, which charge based on the volume of data you ingest into them.

Living large with AKS

As one of the world's most popular managed Kubernetes services, Azure Kubernetes Service (AKS) is an excellent way to get up and running with Kubernetes without having to provision or manage the underlying infrastructure yourself. But AKS isn't magic, and it can't automatically detect and fix every performance, cost-management, or security issue that might arise in your clusters. That's why it's critical to implement AKS monitoring solutions that deliver the visibility you need to use AKS to maximum effect.

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.

We care about data. Check out our privacy policy.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.