Aviv Zohari
,
Founding Engineer
5
minutes read,
December 10, 2024

If you use OpenTelemetry to collect telemetry data, there’s a good chance that metrics are one of the key types of data you’ll work with. Metrics enable a variety of insights into the status and health of applications and services. 

But how, exactly, can you collect metrics using OpenTelemetry? Which metrics types does OpenTelemetry support? What are the goals of OpenTelemetry's metrics system? And how can you make the most of OpenTelemetry metrics? Keep reading for answers to these questions.

What are OpenTelemetry metrics?

OpenTelemetry metrics are raw measurements of quantifiable data points that you can collect using OpenTelemetry, an open source framework for generating and collecting telemetry data. OpenTelemetry collects metrics data at runtime and allows you to export metrics into monitoring and observability tools that support the OpenTelemetry standard.

The main goals of OpenTelemetry's metrics are to standardize the generation and collection of metrics data in a way that allows you to work with metrics using a variety of monitoring and observability tools, rather than having to implement metrics collection differently for each tool.

Examples of the types of metrics information you could collect using OpenTelemetry include:

  • The CPU or memory usage of a server or process.
  • The total number of requests processed by an application since its launch.
  • The time it took to complete the processing of a request.
  • The average latency of requests over time.

These are just a handful of examples. In general, any type of data that you can quantify as a number or percentage can be collected in OpenTelemetry as a metric.

Importantly, metrics are only one of the types of data that OpenTelemetry supports. The other two main types are logs (meaning records of events that take place during the operation of an application, service, or infrastructure) and traces (which monitor requests as they flow through a distributed system). Typically, you’d want to collect metrics, logs, and traces, then correlate them to gain contextual insight into the health and status of your applications and services.

OpenTelemetry metric and metric instrument types (with examples)

| Metrics Instrument | Purpose | Example | |---|---|---| | Counter | Measure values that can only increase. | Total number of requests received by an application. | | Asynchronous Counter | Indirectly measure values that can only increase. | Total CPU time. | | UpDownCounter | Measure values that can increase or decrease. | Request processing duration rates. | | Asynchronous UpDownCounter | Indirectly measure values that can increase or decrease. | Average concurrent connections to a server. | | Gauge | Directly measure value at a given point in time. | Current CPU usage. | | Asynchronous Gauge | Indirectly measure value at a given point in time. | CPU usage at periodic points in time. | | Histogram | Compare data points. | Duration of varying requests. |

The types of metrics you can collect using OpenTelemetry fall into seven basic categories. Each category aligns with what’s known in OpenTelemetry jargon as a metric instrument – which represents a distinct type of information.

Here’s a look at OpenTelemetry’s seven metric instruments and the types of metrics you could collect within each one.

1. Counter

The Counter instrument can perform raw measurements of data that increments over time and can be collected continuously – such as the total number of requests that an application has received. You can use the Counter to record metrics totals over time.

2. Asynchronous Counter

Asynchronous Counter is similar to Counter in that it also measures data that increments over time. However, Asynchronous Counter is designed for use in situations where you measure the value periodically rather than continuously.

For example, you could use an Asynchronous Counter to track the total CPU time used by a process polling its usage every ten seconds, then aggregating the values to approximate total CPU time.

The Asynchronous Counter and other asynchronous instruments work using a callback function. A callback function is a function that can be invoked on demand, making it possible to trigger the generation of metrics data indirectly.

3. UpDownCounter

Use UpDownCounter to measure metrics that can both increase or decrease and that you can monitor continuously – such as request processing duration rates.

4. Asynchronous UpDownCounter

Like Asynchronous Counter, Asynchronous UpDownCounter is for OpenTelemetry metric types that you can’t measure continuously. And like the “regular” UpDownCounter, Asynchronous UpDownCounter measures data that can both increase and decrease.

An example of a metric that you could collect using the Asynchronous UpDownCounter is average concurrent connections to a server, which you could collect by periodically checking the total open connections on the server.

5. Gauge

The Gauge metrics instrument reports current values at the time the metric is read for data that you can collect at any time. Current CPU usage by a process is an example of a type of metric you could collect using this metrics instrument.

6. Asynchronous Gauge

Use Asynchronous Gauge to collect metrics that you can’t (or don’t want) to monitor continuously. For example, if you only want to check a server’s CPU usage every five minutes rather than reporting it as a continuous data stream, this metrics instrument would be appropriate.

7. Histogram

The Histogram metrics instrument reports data as an aggregation of values – such as the durations of multiple requests that an application processes. This is useful when you want to compare individual data points, rather than view them in total or as an average.

Examples of different OpenTelemetry metric data to track

To add further context to OpenTelemetry metrics and metrics instruments, here are some common examples of metrics you can track using OpenTelemetry and how you’d track them:

  • CPU usage: You could track CPU usage using the Gauge metrics instrument, which would show you the CPU usage by an application or infrastructure resources at any given point in time.
  • Error rate: You could use a Counter to measure the total errors experienced by an application over time.
  • HTTP response time: To track HTTP response time and how it varies over time, you could use an UpDownCounter. You could also use a Histogram if you want to compare response times across requests.
  • Throughput: A Gauge could monitor throughput in real time. You could also use an UpDownCounter to track how average throughput changes over time.
  • Network latency: Similarly, a Gauge is also a good way to track network latency, and you could likewise use an UpDownCounter to measure latency over time.
  • Database queries: To track total database queries over time, you’d likely want to use an Asynchronous Counter.
  • Memory utilization: You can track memory utilization using a Gauge.

OpenTelemetry metrics data model

Diagram illustrating the process of sampling time-series data. A yellow line represents underlying continuous data. Events are marked as dots along the timeline, where sampling occurs.

The metrics instruments that we discussed above represent different ways of exposing metrics using OpenTelemetry. For actually collecting metrics and moving the data into monitoring and observability tools, OpenTelemetry’s metrics model supports three main options.

1. Events

Events are data that are measured at specific points in time, like the CPU usage of a process at a specific moment. In addition to reporting the value of the data itself, events can include contextual information, such as the name of the process whose CPU usage you are tracking.

2. Data streams

A data stream is an aggregated collection of events across a period of time. For example, if you continuously measure the CPU usage of a process, you could report that data as a stream, which would show you how the CPU usage varies over time.

3. Time series

A time series is similar to a data stream, except that it includes details about events in sequential order. A time series would be useful if, for example, you wanted to track the latency for application requests over time, because it would allow you to view the latency for each individual request – as opposed to viewing latency as a continuously changing data point that is not correlated with specific requests.

OpenTelemetry metrics concepts

If you’ve read this far, you should have a solid understanding of which types of metric data you can collect in OpenTelemetry and how you can report them to monitoring and observability tools. But there are a few more metrics concepts in OpenTelemetry that we need to discuss to explain how everything fits together.

OpenTelemetry API

The OpenTelemetry metrics API is used to capture metrics data from applications, services, and infrastructure.

SDK

OpenTelemetry’s metrics Software Development Kit (SDK) implements the OpenTelemetry metrics API. This means it provides the actual functionality necessary for applications to expose metrics data. The OpenTelemetry metrics API is separate from the SDK so that metrics collection can be customized and extended at runtime.

Exporters

As their name implies, Exporters export metrics data and send them to a destination (like a monitoring and observability tool). Thus, whereas the API and SDK are used to instrument metrics generation within applications, Exporters do the work of actually moving the metrics data once it has been generated.

How to start using OpenTelemetry metrics

To get started with metrics in OpenTelemetry, you’ll need:

  1. An application or service that you’ll use to generate metrics.
  2. A destination for your metrics, such as a monitoring tool (like Prometheus, to name one popular open source example).

The exact way to go about using metrics with OpenTelemetry varies depending on the type of application or service you’re working with and the types of metrics you want to collect. But as a basic example, imagine you have a Python app where you want to track HTTP requests using a Counter metrics instrument and send the data to Prometheus. To do this, you’d add code like the following to your app:

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import ConsoleMetricExporter, PeriodicExportingMetricReader
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

# Set up MeterProvider and exporters
exporter = PrometheusMetricReader()
metrics.set_meter_provider(MeterProvider(metric_readers=[exporter]))

meter = metrics.get_meter(__name__)

# Define a Counter metric
request_counter = meter.create_counter(
	name="http_requests_total",
	description="Total number of HTTP requests",
	unit="1",
)

# Start Prometheus HTTP server
start_http_server(port=8000)

# Simulate metric recording
def simulate_requests():
	for _ in range(10):
    	request_counter.add(1, {"endpoint": "/home", "method": "GET"})

if __name__ == "__main__":
	simulate_requests()
	print("Metrics server running at http://localhost:8000/metrics")

Key elements of the code include:

  • MeterProvider, which serves as the entry point for OpenTelemetry metrics API.
  • A Counter is used to track the number of HTTP requests in this example.
  • The Prometheus exporter allows you to export metrics to Prometheus.
  • The start_http_server function opens a local HTTP server on port 8000, where metrics can be scraped.

With this setup, you can use a Prometheus server to count your app’s HTTP requests.

Working with OpenTelemetry metrics

Because OpenTelemetry is a telemetry framework, not a data analytics or visualization tool, you can’t actually view or interpret your metrics using OpenTelemetry. If you want to work with OpenTelemetry metrics, you’ll need to use external tools.

Here are three common ways to work with OpenTelemetry metric data.

1. Visualization tools

Using a data visualization tool like Grafana, you could ingest OpenTelemetry metrics and use them to populate charts and graphs. The charts and graphs can display real-time data about whatever you are monitoring. They can also show you trends over time.

2. Analyzing collected data

Monitoring systems like Prometheus can help you analyze data by displaying trends and highlighting anomalies.

Diagram showing apps sending data to an OTLP receiver, processed by processors, and exported via Prometheus exporter to Prometheus, accessible at the '/metrics' endpoint.

3. Troubleshooting using metrics

Typically, if you want to troubleshoot a problem you’ve discovered via OpenTelemetry metrics, you’d want to correlate the metrics with other data – like logs and traces – to gain complete context about the issue. For example, if metrics data shows you that CPU usage for a process has spiked, knowing which log events correlate with the spike, and/or being able to trace requests flowing to the process, can help you get to the root of the problem quickly.

To perform this correlation, you’d use a comprehensive observability platform like groundcover, which lets you work with multiple types of data and analyze all of it through a central hub.

Best practices for using OpenTelemetry metrics

To use OpenTelemetry metrics to greatest effect, consider best practices like the following:

Use labels and attributes strategically

Labels and attributes add contextual information to metrics, helping to differentiate data across dimensions (e.g., region, user type, instance). Use them strategically by limiting the number of labels to avoid high cardinality, which can degrade performance, and choosing meaningful labels that reflect important facets of your system.

Standardize naming conventions for metrics

Standardized names make metrics easy to understand and use across teams. Best practices include using clear, concise, and descriptive names (e.g., http_request_duration_seconds for HTTP request latency), as well as following a consistent format, such as including units in metric names (e.g., _seconds, _bytes).

Integrate with logging and tracing

As we mentioned, metrics are most powerful when combined with logging and tracing to provide comprehensive observability. To integrate effectively, correlate metrics with trace identifiers or log metadata to enable seamless transitions between telemetry types.

Ensure metrics are accurate and relevant

Collect metrics that provide actionable insights and avoid excessive noise. To do this, validate that the metrics align with your business and operational goals, and verify the correctness of metric calculations regularly by, for example, ensuring that latency metrics capture accurate durations.

Regularly evaluate and improve telemetry pipelines

Your observability needs evolve as your systems change. Regular evaluation ensures your telemetry remains effective. To this end, refine your metrics periodically to align with current objectives. It may also help to monitor the performance and scalability of telemetry pipelines to prevent bottlenecks.

Limit metric cardinality

High cardinality occurs when labels have many unique values, leading to excessive resource consumption. Best practices to manage this include avoiding using highly dynamic or user-specific values (e.g., session IDs, IP addresses) as label values. Aggregating data can also help to reduce granularity without compromising critical insights.

OpenTelemetry metrics with groundcover

If you like OpenTelemetry and you like metrics, you’ll love groundcover, which makes it easy to import metrics data using OpenTelemetry.

groundcover dashboard showing a query visualized as a time-series chart, grouped by cluster, namespace, and workload. The chart displays metric trends over the last 15 minutes.

Once your metrics are inside groundcover, you can analyze, visualize, and customize them to your heart’s content, allowing you to make the very most of this key type of insight.

We’d be remiss if we didn’t mention that you don’t have to use OpenTelemetry to collect metrics using groundcover. You can also use eBPF, another metrics collection method that is more efficient. Read more in our article about OpenTelemetry and eBPF.

Optimizing performance with OpenTelemetry metrics

OpenTelemetry metrics may be only part of the observability puzzle, but they’re a critical one. Alongside other types of data – like logs and traces – OpenTelemetry metrics play a crucial role in helping teams understand what’s happening in complex, distributed systems.

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.

We care about data. Check out our privacy policy.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.