Aviv Zohari
,
Founding Engineer
10
minutes read,
May 30th, 2023

Thanks to OpenTelemetry, however, this pain has become a thing of the past – or at least, it is when you choose to take advantage of OpenTelemetry to streamline monitoring and observability. Keep reading for a detailed look at how OpenTelemetry works, which benefits it provides, and how to make the most of it as part of a modern observability strategy.

What is OpenTelemetry?

OpenTelemetry – or OTel for short – is a framework and set of tools to help engineers work with telemetry data. Specifically, OpenTelemetry provides APIs, software development kits (SDKs), and other tools that simplify the process of generating monitoring, observability, and telemetry data within applications, and then import or export data into observability and monitoring tools.

OpenTelemetry is a big deal because it offers a standardized collection of tools and processes for working with telemetry data. When you use OpenTelemetry, you can collect and analyze telemetry data in a consistent way regardless of which applications you're managing or which monitoring and observability tools you are using.

The name OpenTelemetry reflects the concept of OpenTelemetry as an "open" solution (meaning one based on community-defined standards and open source code) for collecting "telemetry" data. In this context, telemetry refers to application performance or security information that is collected and analyzed by remote tools. Thus, OpenTelemetry provides an open means of collecting data from remote applications or services.

OpenTelemetry vs. OpenTracing

| Difference | OpenTelemetry | OpenTracing | |---|---|---| | History | Originated in 2019, based in part on OpenTracing. | Originated in 2016. | | Focus | Metrics, logs, and tracing. | Only tracing. |

OpenTelemetry was predated by OpenTracing, a project launched in 2016 to provide a standardized way of implementing distributed tracing in applications. However, traces are only one type of data that is important for application observability; the other main sources are logs and metrics.

To provide broader functionality that covers all aspects of observability, the OpenTelemetry project emerged in 2019, based partly on OpenTracing. It was also based in part on OpenCensus, a collection of libraries for metrics collection and distributed tracing that originated at Google.

The main difference between OpenTelemetry and OpenTracing is that OpenTelemetry provides access to a wider set of tools. Instead of only supporting traces, as OpenTracing does, OpenTelemetry also lets you collect logs and metrics.

How does OpenTelemetry work?

OpenTelemetry works by providing software libraries, APIs, and SDKs that make it possible to expose application logs, metrics, and traces in a standardized way, and then collect and analyze them using any observability tool that supports the OpenTelemetry standards.

Typically, the process for putting OpenTelemetry to work is as follows:

  1. Developers add an OpenTelemetry library to their applications. This is called instrumentation because it "instruments" the code that tells the applications how to expose telemetry data in an OpenTelemetry-compliant way.
  2. Using the OpenTelemetry APIs or SDKs, observability tools ingest data from the applications they are monitoring.

The process may be more complex than this and could entail additional steps. For example, engineers may choose to transform data before it is ingested by observability tools in order to make the data easier to analyze or process. But the two steps above cover the core elements of exposing data to OpenTelemetry-compatible tools.

What is telemetry data?

As noted above, telemetry data is any type of performance or security data that applications generate and expose to remote monitoring or observability tools. There are three main types of telemetry data: Logs, metrics, and traces.

Logs

Logs record events that take place as an application operates. For example, log data might register user logins or application restarts.

Metrics

Metrics are information that reflect the health and performance of an application – such as how much CPU or memory it's consuming. Infrastructure resources (like servers) can also generate metrics.

Traces

Traces are records of application requests as they flow through a system. In the context of modern, distributed systems, the main purpose of traces is to understand how each microservice responds to a request in order to pinpoint the source of a performance bottleneck.

What is OpenTelemetry used for?

The main purpose of OpenTelemetry is to provide a consistent, standardized way to expose and collect application monitoring and telemetry data.

To be clear, you certainly don't need to use OpenTelemetry to monitor applications. You could instrument your own code to tell your applications how to generate telemetry data and expose it to your monitoring tools.

However, with OpenTelemetry, you get a set of instrumentation libraries that tell applications how to expose data, with minimal coding needed. Your developers can simply use the OpenTelemetry libraries, rather than writing their own code.

In addition, any monitoring and observability tool that supports OpenTelemetry can collect data from any application that uses OpenTelemetry instrumentation. This means you can switch monitoring and observability tools (or use multiple tools at once) without having to change the way your applications generate and expose data.

Components of OpenTelemetry

OpenTelemetry consists of five key components.

1. APIs

Developers use the OpenTelemetry APIs to instrument data collection in applications. The APIs provide the functionality necessary to capture logs, metrics, and traces. APIs are language-specific, so you need to use an API that supports whichever programming language your application uses.

2. SDKs

OpenTelemetry software development kits (SDKs) are a way to configure and fine-tune how OpenTelemetry APIs work. They allow you, for example, to manage batching and sampling during data collection.

3. Collector

The Collector is an agent that collects monitoring and telemetry data from applications that have been instrumented using the OpenTelemetry APIs. From there, the Collector forwards the data to an observability tool for analysis. Essentially, the Collector is an intermediary that does the work of pulling data from applications and sending it to its destination or destinations.

4. Exporters

Exporters are a component of the OpenTelemetry collectors. They are responsible for pushing data from applications to observability tools. Other parts of the Collector handle different tasks, such as pulling the data from applications.

In practice, some use the terms "collector" and "exporter" interchangeably. But technically, the Collector is an OpenTelemetry component that handles all aspects of moving data from applications to observability tools or other destinations, whereas exporters deal with the narrower work of moving the data to its destination.

5. Automatic instrumentation

Automatic instrumentation is a component of OpenTelemetry that provides pre-configured libraries or agents that can send telemetry data to OpenTelemetry-compatible tools. The purpose of auto-instrumentation is to allow developers to instrument OpenTelemetry without having to make virtually any changes to the application code itself.

That said, auto-instrumentation sometimes offers limited control over which types of telemetry data you can collect.

Benefits of OpenTelemetry

As we mentioned, you don't strictly need to use OpenTelemetry. But since the project's launch in 2019, OpenTelemetry has become massively popular because it offers several key benefits:

  • Consistency: With the OpenTelemetry protocol, developers enjoy a consistent means of instrumenting, exposing, and collecting telemetry data.
  • Flexibility: OpenTelemetry offers the flexibility to work with multiple observability tools without having to change the way applications expose data.
  • Simplified observability: By reducing the amount of custom code and configuration necessary to work with observability data, OpenTelemetry streamlines the observability process.
  • Broad observability coverage: Because OpenTelemetry allows you to collect virtually any type of log, metric, and trace, it enables a broad, holistic approach to observability. You don't have to settle for limited visibility due to limited data sources.
  • Easy setup: While you may need advanced expertise if you want to make extensive customizations to OpenTelemetry SDKs, OpenTelemetry requires few special skills to use. In many cases, developers can simply use auto-instrumentation to start generating data.

These benefits would be important for virtually any type of application. However, they're especially valuable in today's world of complex, distributed systems. The more individual application services you have to observe, the harder it would be to manage observability if you had to instrument data generation manually within each service. In addition, by enabling a consistent approach to observability that works with virtually all tools and platforms, OpenTelemetry helps teams thrive in a multi-platform world where tool sets constantly change.

Challenges of OpenTelemetry

Although OpenTelemetry is a powerful solution for simplifying modern observability, it is not without its challenges. Key potential limitations of OpenTelemetry include:

  • Language dependencies: OpenTelemetry only works with certain languages. While it supports all of today's popular languages, you may not be able to use it in legacy apps, or in apps developed using more obscure languages.
  • Limited data support: OpenTelemetry can collect most metrics, logs, and traces, but support for some data types is limited. For instance, not all types of logs are fully supported out-of-the-box.
  • Focus on performance monitoring: Although it's possible to use OpenTelemetry to collect security data, the framework's main focus is on application performance monitoring and observability.
  • Performance overhead: The OpenTelemetry data collectors require CPU and memory to run, which means they place a non-negligible performance overhead on your systems and reduce the amount of resources available to your actual applications.

OpenTelemetry is valuable in many cases, but before committing to it, consider whether there are more resource-efficient solutions available. Make sure, too, that OpenTelemetry will fully support the languages and data types you need to work with.

OpenTelemetry best practices

To get the most out of OpenTelemetry, consider best practices like the following.

Use attributes

Attributes are information about where observability data originated – such as a process name or pod name. By including attributes in your observability data, you add context that can make it easier to interpret the data and pinpoint the source of issues.

That said, avoid extraneous attributes, which will bloat your data and can make it more challenging to sort through all of the information. For instance, you may not need to include namespace names as an attribute if you're also including a pod name, since you could identify the namespace based on the pod if you needed to.

Establish reasonable cardinality

Cardinality is the number of unique values that can be assigned to a given metric. In general, higher cardinality is useful because it provides more context and granularity. However, if cardinality becomes too high, you can end up with so many labels that it becomes challenging to identify meaningful trends.

So, when configuring OpenTelemetry, strive to find a happy medium between cardinality that is too low to provide actionable insight, and cardinality that is too high and leads to irrelevant noise.

Correlate data

Part of the value of OpenTelemetry stems from its ability to support multiple data formats – logs, traces, and metrics. To leverage this ability to maximum effect, you should correlate data. Don't use OpenTelemetry to collect and analyze only logs or only metrics, for example; analyze this data side-by-side so you gain the greatest possible context.

Consider batching

Batching is an optional feature in OpenTelemetry that allows you to export telemetry data in batches, rather than stream it immediately. It can also compress the data. Batching reduces network usage and the number of requests that observability tools need to handle. The tradeoff is that with batching, you don't get access to data in true real time.

Whether you should or shouldn't use batching depends on your use cases and priorities. If true real-time analysis is critical, avoid batching. If you want to reduce resource overhead, take advantage of batching.

Consider sampling

Along similar lines, sampling may or may not make sense based on your use cases and priorities.

Sampling lets you collect only some data, rather than collecting every data point. In this way, sampling reduces the load that OpenTelemetry places on systems, while also cutting down on the amount of data you need to store. But because sampling skips some data, you may end up missing important information that only appears periodically within logs, metrics, and traces.

How to Monitor Kubernetes with OpenTelemetry

As an example of using OpenTelemetry in the real world, let's go over the process for monitoring Kubernetes with OpenTelemetry.

It's actually quite simple. There are just two basic steps.

1. Deploy OpenTelemetry Collector as a DaemonSet

First, you need to deploy the OpenTelemetry Collector as a DaemonSet in your cluster. This ensures that the Collector runs on every node for the purposes of Kubernetes application performance monitoring. The Collector does this by running on each node and monitoring the node itself, as well as any workloads hosted by the node.

You can set up the Collector as a DaemonSet by first creating a collector.yaml file with the following contents:

mode: daemonset

image:
  repository: otel/opentelemetry-collector-k8s

presets:
  # enables the k8sattributesprocessor and adds it to the tracing data, metrics, and logs pipelines
  kubernetesAttributes:
	enabled: true
  # enables the kubeletstatsreceiver and adds it to the metrics pipelines
  kubeletMetrics:
	enabled: true
  # Enables the filelogreceiver and adds it to the logs pipelines
  logsCollection:
	enabled: true
## The chart only includes the loggingexporter by default
## If you want to send your data somewhere you need to
## configure an exporter, such as the otlpexporter
# config:
# exporters:
#   otlp:
# 	endpoint: "<SOME BACKEND>"
# service:
#   pipelines:
# 	traces:
#   	exporters: [ otlp ]
# 	metrics:
#   	exporters: [ otlp ]
# 	logs:
#   	exporters: [ otlp ]

Save it as a file, then install the Collector using Helm:

helm install otel-collector open-telemetry/opentelemetry-collector --values /path/to/collector.yaml

2. Install deployment Collector

In addition to deploying Collector instances on each node, you need to deploy the Collector as a Deployment in your cluster so that it can collect monitoring data from the control plane.

To do this, create a deployment.yaml file with the following contents:

mode: deployment

image:
  repository: otel/opentelemetry-collector-k8s

# We only want one of these collectors - any more and we'd produce duplicate data
replicaCount: 1

presets:
  # enables the k8sclusterreceiver and adds it to the metrics pipelines
  clusterMetrics:
	enabled: true
  # enables the k8sobjectsreceiver to collect events only and adds it to the logs pipelines
  kubernetesEvents:
	enabled: true
## The chart only includes the loggingexporter by default
## If you want to send your data somewhere you need to
## configure an exporter, such as the otlpexporter
# config:
# exporters:
#   otlp:
# 	endpoint: "<SOME BACKEND>"
# service:
#   pipelines:
# 	traces:
#   	exporters: [ otlp ]
# 	metrics:
#   	exporters: [ otlp ]
# 	logs:
#   	exporters: [ otlp ]

Then install it with Helm:

helm install otel-collector-cluster open-telemetry/opentelemetry-collector --values /path/to/deployment.yaml

OpenTelemetry comparison with other observability tools

| Tool | Main differences from OpenTelemetry | |---|---| | Prometheus | Mainly supports metrics collection. | | Grafana | Focuses on data visualization, not data collection. | | Datadog | Provides a broad range of analytics features in addition to data collection. Supports OpenTelemetry but doesn't require it. | | New Relic | Provides a broad range of analytics features in addition to data collection. Supports OpenTelemetry but doesn't require it. | | Zipkin | Only supports distributed tracing. |

To understand why you may or may not want to use OpenTelemetry, consider how it compares to other popular observability tools:

  • OpenTelemetry vs. Prometheus: Prometheus is an open source monitoring tool. Compared to OpenTelemetry, Prometheus's main limitation is that it mostly only collects metrics, not traces or logs. That said, Prometheus is easier to set up and may work well if you only want to collect metrics.
  • OpenTelemetry vs. Grafana: Grafana is primarily a data visualization tool. It's not an alternative to OpenTelemetry as much as a way of visualizing data that OpenTelemetry collects. In many cases, you would use both OpenTelemetry and Grafana together – although you can also import data into Grafana using other data collection methods.
  • OpenTelemetry vs. Datadog: Datadog is a software monitoring and observability platform. It offers many features that OpenTelemetry doesn't support, such as tools to analyze data. That said, OpenTelemetry and Datadog overlap in the sense that both can collect observability data. Datadog supports collection via OpenTelemetry, but it also offers other collection options.
  • OpenTelemetry vs. New Relic: Like Datadog, New Relic is an observability platform that includes many tools and features beyond data collection. Also like Datadog, New Relic provides OpenTelemetry-based data collection options, as well as other methods.
  • OpenTelemetry vs. Zipkin: Zipkin is mainly a distributed tracing tool, whereas OpenTelemetry supports distributed tracing as well as logs and metrics. Consider Zipkin if your only goal is to generate traces, but OpenTelemetry is a better choice if you need end-to-end observability.

OpenTelemetry with groundcover

groundcover offers full support for ingesting and analyzing data collected using OpenTelemetry. This means that if you instrument observability using OpenTelemetry, you can send your data right to groundcover, without having to make any changes to your application code or implement any special configurations in groundcover.

That said, what makes groundcover stand out from your typical OpenTelemetry-compliant observability solution is that groundcover also supports eBPF – an alternative way to collect observability data. Unlike OpenTelemetry, eBPF uses kernel-level data collection, which is much more efficient.

So, with groundcover, you get the best of both worlds – by which we mean OpenTelemetry and eBPF.

OpenTelemetry and the future of observability

OpenTelemetry isn't perfect, but it solves many traditional problems in the realm of observability. Although there may come a day when eBPF fully replaces OpenTelemetry as a more efficient and standardized way to collect data, expect OpenTelemetry to remain a key part of observability strategies for the foreseeable future.

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.

We care about data. Check out our privacy policy.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.