They say that inflation is the “silent killer” of the economy. Hardly noticed, it can slowly gnaw at the purchasing power of our money until we wake up one day and realize that all is no longer well.
The same could be said about latency – just as inflation can slowly erode the value of money without much fanfare, latency can quietly torpedo application performance. Your applications may appear to be performing just fine, with no error codes and normal resource consumption levels yet under the surface, latency issues may be wreaking havoc.
To make matters more complicated, sometimes latency issues are sometimes intermittent – your app might suffer high latency when serving some requests but not others, for example – which makes latency problems even tougher to track down for the purposes of performance monitoring.
That's the bad news. The good news is that, given the right tools and latency reduction techniques, identifying and mitigating latency issues is easy enough. Instead of waiting for your user experience to waste away silently due to slow response rates, you can find latency proactively, figure out what's causing them and improve latency rates before your users suffer.
In this article, we review common application latency causes, then discuss best practices for getting a handle on them so that you can deliver predictable, stable and high-throughput applications.
Making the invisible, visible: How to measure latency
Let's start by talking about how to monitor and measure latency.
As we mentioned, one of the reasons latency management is challenging is that tracking latency is not as simple as tracking things like error responses or application uptime rates. You can't measure latency until you identify a performance baseline for your environment – and that baseline could vary widely depending on which type of application(s) you are running.
A second challenge is that latency symptoms have a tendency to manifest in many places within your environment. This leads to nasty chain reactions, with a single latency problem echoing throughout a distributed app. In other cases, latency results from the confluence of multiple factors; for example, you might have several misbehaving application components that are minor individually, but that add up to degrade application response rates significantly.
So, to figure out when latency problems have arisen, you must know what your application's latency baseline is. Then, you need to monitor the app so you can catch deviations from the baseline. Finally, you must figure out what is triggering the latency – whether it's one problem that is rippling across your app, or a combination of multiple issues – and remediate the root cause.
You can simplify the latency identification process with a few simple tricks:
• Track latency continuously: This might seem obvious but it's worth noting because it's all too easy to ignore latency until you detect another kind of issue (like an error). Don't make that mistake; track latency rates on an ongoing basis. You can measure them in terms of absolute response time (i.e., how many milliseconds it takes to complete a response) as well as using percentile metrics, which indicates a value that a certain percentage (the percentile) of measurements falls below it. For example, a p50 latency metric means that 50 percent of requests experience the latency rate that corresponds with that metric.
• Collect metrics from both applications and infrastructure: Sometimes, latency issues originate not in your application but in the infrastructure – so a problem with kube-proxy, for instance, might lead to high latency rates even if your app itself is fine.
• Focus on context: Context can be that silver bullet when it comes to pinning down latency issues.The more data you have about which types of services and requests are experiencing latency issues, when the issues started, how many application instances or infrastructure components they affect and so on, the faster you'll be able to figure out why they're happening and implement a fix.
The latter point deserves particular emphasis because latency issues rarely affect just one part of your application.As we noted above, latency issues can spread like dominoes to other parts of the app or environment. A slowdown in handling one type of request will also decrease the response speed of another type of request that depends on the first, for example. Meanwhile, unhandled requests will pile up, which aggravates the problem even further. It's only by knowing the full scope of the issue that you can assess its complete impact and remediate it quickly.
It's also frequently the case that latency issues don't originate from a single source. They could stem from a confluence of issues that, if they happened in isolation, wouldn't trigger high latency rates, but that lead to slow responses when all of the issues occur at once. For instance, an application might present increased latency because it is waiting for another application that is being throttled due to node conditions (like a node running out of CPU and its node group reaching its scaling capacity). You would need to address the underlying issues to ensure that the latency issue doesn't keep coming back.
Latency reduction techniques
We'd love to tell you that we have the ultimate guide to reducing latency that will solve all of your apps' woes forever. But we can’t, because latency issues come in many forms and have many causes. The best way to reduce latency will vary from one scenario to the next, with differences in application type, architecture and hosting infrastructure being major factors in determining what causes latency and how to resolve it.
That said, it is possible to break latency problems into two main categories – frontend and backend – and identify strategies that, in general, are effective at reducing both types of legacy.
Frontend latency mitigation: Compress, cache and CDN
Frontend latency problems are latency issues that originate from frontend components of an application.
Below are some of the most common causes of frontend latency issues:
• Uncompressed content: Compressing content is one of the simplest and most effective ways to reduce latency. Compression translates to less data that has to move over the network, which means lower latency, especially in situations where your network bandwidth is nearing capacity. So, compress, compress, compress – and consider using tools like Google Lighthouse to identify places within your apps where you might not currently be taking full advantage of compression.
• Uncached content: Caching is another dead-simple, yet highly effective, way to reduce latency. Caching allows you to avoid having to process the same requests repeatedly. You can therefore respond to requests faster, while also sucking up less memory and CPU to process each one. Although you might not implement caching when you first build your app, investing in a caching layer can help you take advantage of caching as the application scales.
• Not using a CDN: A Content Delivery Network, or CDN, is a network of hosting locations spread across a wide geographic area. CDNs can dramatically reduce latency by ensuring that an application instance is available in close geographic proximity to the users who issue requests to it. This saves your app from having to send and receive data from thousands of miles across the planet in order to process each request. It also leads to lower latency because even on high-end fiber optic cables, you'll add about 5 microseconds (which is 0.005 milliseconds, in case you're wondering) in latency for each kilometer that your data has to move. If you do the math, you'll find that moving network data across the entire planet (whose circumference is about 40,000 kilometers) takes about 200 milliseconds. CDNs shorten the distances and reduce the milliseconds significantly.
So, the simple way to reduce frontend latency boils down to what you might call the three Cs: Compress, cache and (use a) CDN.
Backend latency mitigation: cache resources, use microservices and avoid blocking
Backend latency results from issues inside your application, such as one microservice failing to interact properly with another. More often than not, backend latency results from one of three main causes:
- Uncached resources: Just like frontend content, the data that your application needs to process in the backend should be cached where it makes sense in order to reduce processing times (and, as a bonus, decrease resource consumption). When you first develop your app, you might not implement much resource caching, but you absolutely should cache resources wherever possible in order for your app to scale. It's well worth the effort of building a cache layer into your app in order to identify which content can be cached, cache it and serve it to speed backend requests.
- Too many, or too few, microservices: Perhaps your application contains some microservices, but you could improve latency on the backend by breaking it down into even more microservices. Or, maybe you have too many microservices, which increases latency risks because it creates more places where holdups can occur. The point here is that to minimize latency, you need to strike the right balance between too many and too few microservices. The right number will vary from one app to the next, but your goal should be to deconstruct your app into enough microservices to mitigate latency, without going overboard to the point that microservices become a source of latency.
- Blocking procedures: In programming, blocking procedures are calls that "stop the world" until they complete. They can trigger high latency because they halt the processing of one request until another is complete. In the past, blocking procedures were often a necessary evil because they were the only way to prevent multiple services from trying to use the same resources at the same time. But modern programming languages and techniques, like coroutines and event-based patterns, make it easy to avoid blocking – and improve latency – in many cases.
Getting a handle on latency
Latency, like inflation, is a hard monster to tame. It has a wide variety of potential causes, and the go-to application performance management strategies that teams have traditionally relied on aren't always sufficient for uncovering latency.
But just because mitigating latency is hard doesn't make it impossible. After all, virtually every software developer or IT engineer has to fight latency, and experienced ones know that it can be beaten, if you have the right tools and techniques on your side.
A final word: We’d be doing you a disservice, dear reader who has boldly read thus far, if we didn't stress the importance of leveraging a monitoring and observability solution that delivers deep visibility into your applications as part of your mission to end latency. With groundcover, a one-line installation is all it takes to get rich observability data via eBPF in order to take control of latency – and of application performance optimization in general.
Sign up for Updates
Keep up with all things cloud-native observability.