Enhanced Tracing Experience with Waterfall View

Eyal Cohen

Updated on: Dec 22, 2024

Published on: Dec 03, 2024

December 3, 2024

April 10, 2025

min read

Kubernetes

groundcover

At groundcover, we're strong believers in dogfooding our product daily to debug issues in our own product, which means we feel the pain points just as much as our users do, and hopefully, before they get a chance to.

When an incident occurs, distributed traces are often our first stop. Our current toolkit includes flame charts, service maps, and tables. Our users attest to loving them, but we felt we could take them one step further with a clear and intuitive waterfall view that shows us the timeline and duration of traces flowing through the system - in a single view.

Visualizing distributed traces isn't just about making traces look pretty—it's about helping developers get to the root cause of an issue faster. Whether you're tracking down a system crash, a performance bottleneck, or debugging a header that dropped across distributed systems, the right visualization means less time investigating and more time actually fixing issues.

So, assuming you have better things to do than to stare at traces all day, read on to find out why waterfall views are a game-changer for visualization of distributed traces, and how our new waterfall visualization helps you solve problems faster.

What are distributed traces?

Feel free to skip ahead if you’ve got distributed traces down.

Historically, debugging distributed systems was like solving a puzzle with only half the pieces.

Engineers relied on logs across different services, metrics that told only part of the story, and a lot of guesswork. It was about as efficient as diagnosing a car problem by listening to separate reports from the mechanic, the driver, and the onboard computer.

Then came distributed traces.

Distributed traces reveal the complete journey of a request. Every hop across services, every database query, and every external function call is captured, showing you the full, end-to-end path of a request as it flows through your application.

Traces are the x-rays of your system. They reveal how requests flow through your services—like blood through veins—helping you pinpoint bottlenecks and anomalies with precision. Each trace is made up of spans—the individual operations that make up the request's journey. Each span tells you not just what happened but also when it started, how long it took, and whether it succeeded.

If you want to dive deeper into the difference between distributed tracing and logs, how distributed tracing works, and the different types of distributed tracing, I recommend this post, written by CTO Yechezkel Rabinovich.

Visualizing Distributed Traces

Understanding distributed traces can be challenging due to their inherent complexity. Each visualization offers unique insights into your system's behavior, helping you understand different aspects of your distributed applications.

Let's explore the main visualization types and their specific strength:

| Type | Pros | Cons | |---|---|---| | Flame Graphs | Let you view the big picture. Great at showing relationships between different levels. | Hard to figure out exactly when things happened or how long they took. | | Service Maps | See how everything connects. Easily understand your architecture. | Won't help much when you're trying to figure out what went wrong with a specific request. | | Data Tables | Give you all the details. Every timestamp, every attribute, everything. | Hard for understanding what actually happened by scrolling through endless rows of data. | | Waterfall Views | See everything on a timeline, making it super easy to see what happened when. Quickly spot bottlenecks and errors. | Can be overwhelming with large amounts of span. |

While each visualization type serves its purpose, waterfall views stand out for debugging distributed systems, especially during critical incidents.

Why Waterfalls are Great

We care deeply about helping users understand the massive amount of telemetry data they send us. Distributed traces, in particular, present a unique visualization challenge.

Each trace may contain dozens, hundreds, or even thousands of spans across multiple hierarchy levels, with span durations ranging from nanoseconds to minutes.

While traditional visualization methods like flame charts and tables serve their purpose, they often fall short when you need quick insights—especially during intense war room incidents when every second counts. Our waterfall view empowers you with quick insights, making you more efficient during these critical moments.

That's where waterfall views shine. They transform complex trace data into an intuitive, chronological story. Each span flows naturally from left to right, revealing the complete story of a request's journey through your system.

Take error detection, for example. With our waterfall view, you can instantly spot where things went wrong:

The waterfall visualization shows you the critical information at a glance:

The span operation (like GET /api/users)
The service where the span was sampled
Status codes for quick error identification
A tree visualization showing the relationship between spans
Span duration right on the timeline

We've also made the view fully responsive. You can adjust the information density, duration view, and span size based on your current debugging needs. The waterfall adapts to your investigation style.

Our trace drawer includes all the detailed Span sections that provide crucial context - from request/response data to attributes and correlated logs because sometimes, you need to dig deeper to understand the whole story.

Use Cases for the Waterfall View

Forget about the theory behind all of this, let’s take a look at a real life example of why waterfalls are so helpful. We recently discovered that one of our compression headers mysteriously vanished between our frontend and backend services.

In the past, debugging this would have meant jumping between different services' logs, or even searching for different spans - trying to piece together what happened. With the waterfall view, we could collapse unrelated spans to focus on the request flow, expand the relevant services, and examine headers at each hop.

The visualization showed that our Nginx proxy wasn't properly propagating the header. What could have been hours of debugging across multiple services turned into a few minutes of investigation. The ability to quickly navigate a single request's journey while comparing header values at each step made all the difference.

Other popular use cases to consider:

Identifying Performance Bottlenecks
With a chronological view of spans, it's easy to see which services or operations are slower than expected. You can pinpoint time-consuming steps, whether it's a database query, an external API call, or internal processing. A waterfall view can help you understand how each operation affects the total request time.
Error Analysis in Distributed Systems
When errors happen in distributed systems, knowing the sequence of events is key to success. You can follow the error's path and get the full context, including related spans that might have contributed to the issue.
Understanding Dependency Relationships
Distributed tracing helps map out how services depend on each other. Visualizing the call hierarchy lets you see which services rely on others and how those dependencies impact performance. This insight is crucial for spotting cascading failures or understanding which upstream or downstream services might need optimization.

Conclusion

Distributed traces are powerful, but their real value comes from quickly understanding what they're telling us.

You want to see what matters when it matters most - and this is why a waterfall view can transform traces from complex data structures into clear, actionable insights.

We're bringing all the puzzle pieces of distributed system debugging together with color-coded, chronological visualization that makes trace analysis feel complete - and this is just the beginning - We have some exciting features in the pipeline that will make the waterfall view even more powerful (who said AI?)

But first, we'd love to hear what you think! Try it out and let us know how it helps you debug workflows - or what pieces you'd like to see next. Join our Slack community to share your feedback and stay updated on what's coming next.

Sign up for Updates

Keep up with all things cloud-native observability.

Enhanced Tracing Experience with Waterfall View

What are distributed traces?

Visualizing Distributed Traces

Why Waterfalls are Great

Use Cases for the Waterfall View

Conclusion

OPA with Kubernetes: How It Works & Benefits of Use

Empowering AI Agents with Observability Data: groundcover’s Road to MCP

Crossplane in Kubernetes: Features & How to Implement It

Sign up for Updates

What are distributed traces?

Visualizing Distributed Traces

Why Waterfalls are Great

Use Cases for the Waterfall View

Conclusion

Explore related posts

OPA with Kubernetes: How It Works & Benefits of Use

Empowering AI Agents with Observability Data: groundcover’s Road to MCP

Crossplane in Kubernetes: Features & How to Implement It

Sign up for Updates

Get startedwith groundcover

See the platform in action

Book an on-demand demo with a customer engineer

100% visibility all the time.

Troubleshoot like a pro.

Reduce data & growth costs, dramatically.

Done!

Get started
with groundcover