K8s Logging at Scale: From Kubectl Logs Tail to the PLG Stack
Read how 'kubectl logs –tail' enables real-time log viewing for development. What are the factors to consider for production-scale Kubernetes environments?
Regardless of how long you've been working with Kubernetes-based systems, there's one problem-determination technique that we can guarantee is in your tool kit: log messages.
During the development process, viewing log output in real time using ad-hoc tools like the kubectl logs --tail command gives you an easy way to see what's going on in your system.
When you deploy your containers into a production environment, however, maintaining that level of observability can become a burden as you try to manage and analyze large volumes of messages from multiple containers executing in multiple pods. In modern cloud-native environments, one must collect, understand, and investigate millions of logs from different sources to understand what’s happening at an application’s runtime.
Legacy logging solutions simply can’t keep up with the complex, distributed infrastructure nature of modern product environments, making finding a simple and performant solution to help you manage this complexity a key to your ongoing Kubernetes logging management efforts. In - Loki by Grafana.
Kubernetes Logs: The Basics
Before diving into how to use Loki to work with logs in Kubernetes, let's go over how Kubernetes logging works and how to access logs using kubectl, the standard Kubernetes CLI tool.
Various resources – such as nodes, pods, and containers – in Kubernetes generate logs. The exact data recorded in a log varies depending on the type of log you're dealing with. But in general, you'll find the information about the following types of events in log entries:
- Changes to a resource's state.
- Errors and warnings.
- Authentication and authorization requests.
Using this information, you can gain important context about what's happening in Kubernetes. Although logs alone don't typically help you troubleshoot complex errors, correlating logs with other sources of data – such as metrics and traces – is an effective way to pinpoint the source of errors.
How to Access Logs in Kubernetes
The process for accessing logs generated in Kubernetes varies depending on which type of log you're trying to work with. Here's a look at how to access the most common types of logs.
Viewing Pod Logs with Kubectl
The simplest way to view container logs is to tell kubectl to display logs from your pods.
What Is Kubectl?
Kubectl is the command-line tool for administering Kubernetes. Most Kubernetes distributions use kubectl as the default method of deploying, managing, and collecting information about resources in Kubernetes. (Some distributions have alternative tools, or tools, that offer complementary functionality – such as eksctl on EKS or oc on OpenShift – but their syntax is typically very similar to kubectl's.)
So, if you're running Kubernetes, you probably already have kubectl available. And you hopefully know a thing or two about how to use it. If you've managed to get by thus far using only the Kubernetes Dashboard to manage your cluster, more power to you, but you'll probably need to familiarize yourself with kubectl to enable advanced Kubernetes administration.
What Is the Kubectl Logs Command?
To view logs of a specified pod using the kubectl logs command, run:
Be sure to modify the kubectl logs command with the proper name of your pod, as well as the appropriate namespace. In addition, if the pod has multiple containers and you wish to view logs for only a certain container, use the -c container-name flag.
The output will display log entries that look something like the following (this is the top of a log file for a MySQL pod):
By default, the kubectl logs command simply dumps to the command line the contents of the logs generated by the container you selected. This is inconvenient if the log is hundreds or thousands of lines long. In that case, you may want to save it to a text file, which you can do by redirecting the output of the kubectl logs command to a file. For example, this command saves the logs of the pod named mysql-67f7987d45-q8n86 to a file located at /tmp/log-file:
You can then open /tmp/log-file in a text editor to parse the log data more easily.
You can also use external commands like grep to help make sense of log data reported by the kubectl logs command. For instance, if you want to view only lines within a log file that mention errors, you could pipe the log into grep using a command like the following:
The Tail Flag in the Kubectl Logs Command
Instead of viewing all of the contents of the logs generated by your containers, you can tail logs by adding a tail flag to the kubectl logs command flag. This allows you to access streaming logs that show the most recent log data.
For example, this command displays the 5 most recent lines from the log for the specified pod (which in this example is a pod named mysql-67f7987d45-q8n86):
Viewing Node Logs
Unfortunately, the kubectl logs command doesn't easily support viewing logs from nodes – meaning the logs generated by the operating systems running on your host servers. To access node logs, you can SSH directly into the nodes, or use a log collection service that aggregates logs from your nodes.
That said, Kubernetes versions 1.27 and later offer a newish feature called node log query that makes it possible to access logs for certain services directly from nodes. But currently, this only works if you explicitly enable the feature.
Enter Grafana Loki
Now that we've told you how to access logs using kubectl, let's talk about why you shouldn't do this if you need to work with multiple pods or must manage log data at scale – and why you should instead use a solution like Loki.
You almost certainly have heard of Grafana, the company that has made its mark with open-source software that enables easy visualization of data from many different sources. In the Kubernetes world, Grafana may be best known for the metrics visualization component of the Prometheus-based cluster metrics solution.
But things are changing and more recently, Grafana has been evolving into a full-blown observability vendor in its own right, with new projects such as Loki, Mimir, and Tempo addressing the key observability requirements for logging, tracing and metrics.
The Loki project in particular is squarely focused on the challenge of managing distributed, high-volume, high-velocity log data with a cloud-native architecture inspired by Prometheus (and in fact Loki touts itself as "like Prometheus, but for logs").
Loki is equipped with many advantages making it a great fit for the challenges of modern environments.
It's simple to set up and easy to operate, It only indexes metadata instead of the full log messages making it light-weight, it works well together with other cloud-native tools such as Kubernetes and it uses common object storage solutions like Amazon S3.
Available as either a self-managed open source version or a fully-managed service provided by Grafana Cloud, Loki forms the foundation of what is known as the "PLG" stack: Promtail for log stream acquisition, Loki for aggregation, storage and querying, and Grafana for visualization.
Bye ELK, Hey "PLG" Stack
Looking at the "PLG" stack it's easy to see how the system was influenced by the design of Prometheus.
Promtail is an agent - provided as part of the Loki product - that is responsible for discovering and retrieving log data streams. It functions in a role similar to Prometheus' own "scraper", and its configuration files are syntactically identical to those used by Prometheus. It essentially "tails" the Kubernetes master and pod log files and forwards them on to the core Loki system. (It's important to note that Loki supports many different agents provided by both Grafana and its developer community, which can make migration to a PLG-based solution much easier for users of daemons such as fluentd or logstash.)
Loki, of course, is the heart of the PLG stack and is specifically designed for handling log data. Loki's unique characteristics - which we'll talk about in much more detail in a bit - make it both highly efficient and cost-effective at both ingesting and querying log data.
The Grafana dashboard and visualization tool rounds out the "PLG" suite, providing powerful features to enable analysis of application, pod, and cluster logs.
How Does Loki Work Under the hood?
Architecture and Deployment Models
Architecturally, Loki is comprised of five different components:
- The distributor is a stateless component responsible for acquiring log data and forwarding it to the ingester. Distributors pre-processes the data, check its validity, and ensure that it originates from a configured tenant, which helps the system scale and protects it from potential denial of service attacks. Grafana provides a great explanation here of how Promtail - the recommended distributor agent - processes data.
- The ingester is the key component in the Loki architecture. Data received from distributors is written by the ingester to a cloud-native long-term storage service. Ingesters also collaborate with queries to return in-memory data in response to read requests.
- Queriers are responsible for interpreting LogQL query requests and fetching the data either from ingesters or from long-term storage.
- The query frontend - an optional component - provides API endpoints that can be used to accelerate read processing. This component optimizes read processing by queuing read requests, splitting large requests into multiple smaller ones, and caching data.
- Like Prometheus, Loki supports alerting and recording features. These features are implemented in the ruler component, which continually evaluates a set of queries and takes a defined action based on the results, such as sending an alert or pre-computing metrics.
For scalability, all of these components can be distributed across systems as needed.
Loki can be deployed locally in one of two modes:
- A monolithic mode (the default) which runs all of Loki's binaries in a single process or Docker container. This is a good starting point for learning more about the product.
- A microservices deployment mode, which allows the Loki components to be distributed across multiple systems and provides high scalability.
An additional local deployment mode, called the "simple scalable" mode, is a good intermediate step when your requirements exceed the monolithic mode capabilities but do not warrant a large-scale microservices deployment. Of course, if you don't want to manage Loki at all then Grafana Cloud might be the option for you.
Key Features
Loki implements some amazing features that are specifically designed to distribute the load, protect the system from attack, and make use of efficient storage mechanisms.
Labels
Unlike many log processing systems, Loki does not perform full-text indexing on log data. Instead, it leverages a concept borrowed from Prometheus - labels - to extract and tag information from the log data, and then indexes only the labels themselves. This dramatically improves performance on both the write and read path, and - equally valuable in our mind - enables a consistent label taxonomy regardless of input source.
Since this is such a critical benefit of Loki, let's dig into an example from Loki's documentation. Let's say you have a Loki "scrape configuration" like the one below:
The labels section of this configuration is particularly important. In this section, the __path__ variable that defines a log file to be read, and the keyword job that defines a label to look for and a value to be used to filter the records. Using this configuration, the Loki distributor will "tail" the log file, look in each record for a variable called job that has a value of syslog, and then create a Loki "stream" of records containing this keyword and value. The indexes for the job label and the data chunks containing the label and value are then written to persistent storage.
This stream of records can be queried using a simple LogQL query:
When processing the query, the Loki querier component will find the indexes that point to records with a job label of syslog, and then retrieve the records.
Cloud-native backend storage
Because the raw log data itself is not indexed, Loki can improve the system's cost-effectiveness by leveraging cloud-native object storage services such as Amazon S3, Amazon DynamoDB or Cassandra as the backend data repository. To improve query processing, Loki uses the cloud service to store the data as "chunks" (the raw log data) and "indexes" (the normalized and indexed labels and data extracted from log records). Queriers use the more efficient indexes to find the requested chunked log data.
LogQL
Loki implements a log query language called LogQL that borrows heavily from Prometheus' PromQL language. LogQL can be used both directly and via a Grafana front-end dashboard. Having a consistent query language for both logs and metrics flattens the learning curve and facilitates dynamic filtering and transformation.
Installing the "PLG" Stack on Your Kubernetes Cluster
Loki has several installation mechanisms: Tanka (which is used in Grafana's own Cloud deployments), Helm charts for both "simple scalable" and microservices deployments, a mechanism using Docker / Docker Compose, and downloadable binaries. If desired, you can also download the Loki source code from the Github repository and build the system locally. Grafana provides instructions here for each of these installation methods.
Loki: A Better Kubernetes Log Management Solution
Live tailing Kubernetes application, pod, and cluster log files is an extremely helpful technique for tracking what's going on with your containers and applications in near real time. Grafana's Loki product takes that to the next level with capabilities inspired by the popular Prometheus metrics system, easy scalability for managing even highly complex environments, and some serious enhancements to make dealing with log files simpler than ever. If you're looking for a better Kubernetes log management solution, Loki is definitely worth a try.
Sign up for Updates
Keep up with all things cloud-native observability.