By storing data in memory, Redis key-value stores can read and write data much faster than databases that depend on conventional storage. That's part of the reason why many of the world's largest tech companies – such as Twitter, Snapchat, and Craigslist – depend on Redis Servers, and clouds like GCP and Azure offer hosted data stores that use Redis protocol.

Unfortunately, though, Redis key-value stores don't always work the way they should. You may run into issues like slow performance due to low hit rates and poorly sharded data. Problems like these must be identified and fixed, otherwise, what's the point of paying for an in-memory key-value store if it's not living up to its full potential?

Monitoring Redis databases in order to troubleshoot performance and other problems isn't always as straightforward as you might like. But it's possible to do – especially with the help of tools like eBPF, which makes it possible to gather Redis monitoring insights in ways that would simply not have been possible using traditional approaches to Redis performance monitoring.

What is Redis?

Redis is a single-threaded, high-throughput, low-latency, in-memory key-value store. That's a long (and overly hyphenated) way of saying that Redis uses in-memory data storage to deliver performance that’s hard to achieve using conventional databases.

If you're into databases and data structures, you might also enjoy knowing that Redis supports multiple types of data structures – including hashmaps, lists, sets, counters, and more. That makes it a very flexible key-value store that is suited for many use cases, such as caching, pub/sub patterns, text search, graphs, atomic operations, and rate limits, just to name a few.

On top of all of this, Redis also supports atomic operations, either through the use of transactions or by evaluating custom Lua scripts on requests. 

To enhance performance and resiliency, Redis supports sharding, although that comes at the cost of losing atomicity (since single commands can't run across shards). 

Alternatively, you can operate using a master/replica model that allows you to run multiple Redis nodes (each with a full copy of the data), achieving the benefits of more compute and memory resources without losing atomicity.

Master/Replicas vs. Sharding

Oh, and in case you're wondering what happens to your in-memory data if your nodes shut down unexpectedly, Redis has a solution for that: You can configure persistent storage using the fsync feature, which syncs data to persistent files as backups.

Common Redis use cases

Redis lends itself to a variety of use cases. Here's a look at some of the most popular reasons for choosing Redis.

Primary database

At first glance, you might assume that because Redis stores data in-memory instead of writing it to disk, it's not a great solution for running a primary database. Data stored in memory is in some respects less reliable because if the system shuts down suddenly, the data will be lost permanently.

That said, most modern servers are very stable and don't typically shut down without warning, so this is not typically a major issue. Plus, by distributing data across multiple nodes, Redis can provide a degree of protection against unexpected data loss. It's also possible to back up Redis data to persistent disk storage if you want even more protection against the risk of data loss.

All of the above means that people can, and often do, use Redis instances as their main database. In this sense, Redis can serve as an alternative to popular traditional databases, like MySQL, which store data on disk instead of in memory – but which are also typically much slower than Redis precisely because they don't take advantage of in-memory data storage.

Data caching

In addition to speeding up data access by keeping data in memory, Redis can improve performance even more by storing frequently accessed data in the Redis cache. This means that rather than having to serve the data from scratch each time it's requested, Redis keeps the information in a location where it can be accessed within just milliseconds.

Caching enables a better user experience because it allows applications to fulfill requests faster. In addition, caching can reduce the CPU and memory usage associated with Redis database operations by eliminating the need to process repetitive requests fully.

Stream processing

The ability to serve data very quickly makes Redis instances a powerful solution for stream processing use cases – meaning ones where incoming data must be processed as quickly as it's created.

For instance, if you want to ingest user messages into a database where you can also begin analyzing them in real time, a Redis database could help you do that. Likewise, Redis could write notifications from monitoring or observability data to a data stream so that they'll be immediately available for processing.

Message queuing

Along similar lines, a Redis instance works well for message queuing use cases. These are situations where applications need to exchange messages for purposes like keeping in sync with one another or processing requests within a distributed system. Redis can support this type of use case by storing each message or task in a queue, and then distributing them as needed.

Text search

Another use case where Redis instances excel is text search. Here, the main advantage of Redis over other types of databases is that each Redis instance stores data in memory, making it much faster to parse. As a result, Redis is particularly useful for text search use cases involving very large volumes of information.

Session management

Session management is the process of tracking requests from the same application user during the time that they are using an app. Redis can handle this use case by storing data about each request in memory, and then making it available to the application so that the app can deliver a consistent experience to the user.

Using Redis for session management use cases tends to lead to better performance than using a traditional database, which can't read and write data as quickly.

Rate limiting

Rate limiting means restricting the number of requests that a system processes. Typically, the purpose of rate limiting is to prevent systems from crashing due to receiving an excess number of requests either as a result of malicious activity (e.g., Distributed Denial of Service attacks) or of application bugs. Rate limiting is often performed on a user-by-user basis, meaning that rate limiting restricts the number of requests a particular user can make within a given time frame.

To implement rate limiting using Redis, developers can track requests on a per-user basis based on request IDs or API keys. Every time a user issues a new request, Redis increases the number of requests associated with the user using the INCR command. When requests surpass the rate limiting threshold, Redis sends a message to the application telling it to block or ignore further requests from that user.

Key Redis metrics to monitor

To gain comprehensive visibility into what's happening within your Redis clusters, you'll typically want to track the following metrics data:

Metric What it means Why it's important
CPU usage How much CPU time Redis is consuming as a percentage of the total available CPU. Excess CPU usage could be a sign of a bug or misconfiguration. It could also simply mean that you need to provision more nodes (or nodes with more CPU resources) to support your Redis cluster.
Memory usage How much memory Redis is using as a percentage of the total available memory. If the memory consumed by Redis exceeds the total available memory, Redis will write excess data to disk, leading in most cases to a dramatic reduction in performance. Tracking memory usage helps you identify and get ahead of these issues.
Latency The delay between when Redis receives a request and when it completes processing it. High latency (which means that requests are taking a long time to process) is a sign of poor Redis performance. Latency problems can result from many potential causes, such as a lack of available CPU and memory, a poorly designed Redis cluster architecture, or inefficient requests.
Memory fragmentation ratio The amount of memory used by Redis compared to the amount used by the operating system. The more memory the operating system is using, the less memory there is for Redis to use. In addition, high amounts of memory usage by the OS could force Redis to spread related data across disparate sections of memory. This leads to poorer performance because it takes longer to read and write data when it's not stored in contiguous sections of memory.
Cache hit ratio Number of successful data reads compared to the total number of data read attempts. If a high number of data reads are failing, Redis likely is either running short of sufficient memory or is experiencing errors. In general, aim for a cache hit ratio of at least 0.8.
Connected clients Total number of Redis client connections. Helps track how much overall load exists for Redis. While a high or low number of connected clients is not necessarily a problem, counting client connections provides context that you can use to troubleshoot other issues; for instance, if latency and client count increase at the same time, it's a reasonable conclusion that Redis is simply experiencing increased load, leading to higher latency.
Cluster infrastructure metrics CPU usage and memory usage metrics data from the servers that operate as Redis nodes. This data helps you track the health and stability of your underlying Redis infrastructure.

Common Redis issues

Although Redis can do lots of cool things, it can also run into a lot of problems – just like any database.

Low hit rate

One common issue is what's known as a low hit rate. This can cause poor performance on your Redis server due to TTL misses. You can check the hit rate using the Redis command line interface (CLI) INFO command:

# redis-cli
127.0.0.1:6379> info

Response

keyspace_hits:21253
keyspace_misses:12153

With this data, you can calculate a hit rate for all the keys on your Redis server.

Large values

Large values in sorted sets, lists, and hashes can trigger problems like incorrect cleanup logic or missing TTLs. To find the keys with the biggest values, run:

# redis-cli --bigkeys	
	Response
# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).
[00.00%] Biggest hash  found so far '"dae13887-931a-4f6d-b825-ee5abe1cd314"' with 1 fields
[00.00%] Biggest string found so far '"cart"' with 2 bytes
-------- summary -------
Sampled 21 keys in the keyspace!
Total key length in bytes is 724 (avg len 34.48)
Biggest hash found '"dae13887-931a-4f6d-b825-ee5abe1cd314"' has 1 fields
Biggest string found '"cart"' has 2 bytes0 lists with 
0 items (00.00% of keys, avg size 0.00)20 hashs with 
20 fields (95.24% of keys, avg size 1.00)
1 strings with 2 bytes (04.76% of keys, avg size 2.00)0 streams with 
0 entries (00.00% of keys, avg size 0.00)0 sets with 
0 members (00.00% of keys, avg size 0.00)0 zsets with 
0 members (00.00% of keys, avg size 0.00

Large JSON keys

Using large JSON keys instead of Redis hashes is another common Redis issue. It happens when you use a single key to hold a JSON value as a string, causing lookups in your apps to be very inefficient.

A simple solution is to hold the data in a hash so you get a full lookup using a single field in O(1) complexity.

Using lists instead of sets

It's easy to use lists in Redis via push/pop commands, but overuse of lists can lead to duplicate values. To avoid this issue, use sets instead of lists after you identify an unexpected value size.

Redis Cluster Issues

Beyond the common Redis issues that we outlined above, you may also run into more complex Redis cluster issues.

Poorly sharded data

Redis clusters spread their data across many nodes. When you use a Redis cluster with a general-purpose hash instead of using multiple keys, your cluster can suffer a performance hit. This happens because the key is stored on a single node, and in a high-scale environment, the pressure will fall on that node instead of being distributed between all of the nodes in the cluster. The result is that the node becomes a performance bottleneck.

As a real-world example, consider a cluster that stores user data in a hash, where the key is the user ID. An authentication server that performs a lot of lookups on the user ID will place heavy pressure on the node that stores the key. A solution would be to spread the hashed data to multiple keys across nodes, letting Redis's sharding algorithm distribute the pressure.

MOVED errors

When performing multi-key operations in a single command – such as MGET, pipelines, and Lua script Evals – it’s very easy to forget that Redis hashes on every key and decides which shard in the cluster to place its value. This behavior raises the possibility of a MOVED error. The MOVED error is a response returned by one of the nodes telling you that the data is stored on a different node, and that it's the client’s responsibility to go to the relevant node and ask for the data.

For example, consider this code:

redis.Set(ctx, "userA:age", "30", -1)
redis.Set(ctx, "userB:age", "28", -1)
redis.Set(ctx, "userC:age", "40", -1)

pl := redis.Pipeline()
pl.Get(ctx, "userA:age")
pl.Get(ctx, "userB:age")
pl.Get(ctx, "userC:age")

Here, we perform pipelined requests on three individual keys. The requests execute on a single node, and if one of the keys is not on that specific node, the commands with the keys that are on that node will return a response while the others will return the MOVED error.

A quick fix is to use hashtags in the key structure, which means simply adding curly brackets around the part of the key that we want to hash by will cause the sharding algorithm to direct the values to the same node:

redis.Set(ctx, "userA:{age}", "30", -1)
redis.Set(ctx, "userB:{age}", "28", -1)
redis.Set(ctx, "userC:{age}", "21", -1)

pl := redis.Pipeline()
pl.Get(ctx, "userA:{age}")
pl.Get(ctx, "userB:{age}")
pl.Get(ctx, "userC:{age}")

Multiple set/get operations

Since Redis executes commands on a single thread, it provides atomicity when executing a command – or at least, it should. But imagine a more complex scenario, where you’re using the Exists function to check if a key is set. If it does, we increment a counter:

exists, _  := redis.Exists(ctx, "userA").Result()
if exists > 0 {	
  redis.Incr(ctx, "user_count")}

In this instance, we lost atomicity. Our requests are separate, and by the time we get to the second command, the key that we were checking might not exist anymore.

We can solve that using a temporary Lua script that will ensure atomicity since the script is evaluated as a single request:

redis.Eval(ctx, `if redis.call('exists', KEYS[1]) > 0 then redis.call('incr', KEYS[2]) end `,
[]string{"userA", "user_count"})

We could also store our Lua script for future use using Redis's SCRIPT LOAD command, which stores the script on the Redis node and lets you trigger it by its SHA hash with the EVAL SHA command:

scriptHash, _ := redis.ScriptLoad(ctx, `if redis.call('exists', KEYS[1]) > 0 then redis.call('incr', KEYS[2]) end `).Result()
edis.EvalSha(ctx, scriptHash,[]string{"userA", "user_count"})

How to monitor Redis performance

Clearly, Redis issues come in many shapes and sizes and their solutions are equally varied. That's why Redis performance monitoring can be critical to your success.

Redis performance monitoring best practices:

Redis provides a few built-in commands to extract Redis metrics:

  • Info/cluster-info: Shows you raw information/counters about your Redis server/cluster, such as memory metrics, CPU usage, shards, and hits. The output of this command is designed to be easy to parse programmatically, so you can export it to Prometheus or your preferred monitoring tool.
  • Monitor: Shows all the commands that are being executed on the server at the time the command runs. It is a great tool to find out what’s going on, but in a real production environment, it's very difficult to correlate the command with a particular issue.
  • Slowlog: Shows the slowest commands on the server, which can help you find commands that should be optimized in order to improve performance.

If you want to track the output of these commands in real time without having to run the commands manually, you can configure the Grafana Redis data source with the Redis dashboard, which displays data based on these four commands.

Hosted Redis solutions:

If you are using RedisLabs Cloud, Amazon ElastiCache, Google MemoryStore, Microsoft Azure Cache for Redis, or any other hosted Redis solution you can use the cloud provider’s exposed metrics to get cache hit ratio/CPUcpu/memory/evictions and more.

Redis monitoring challenges

We mentioned that monitoring your Redis database is important, but we didn't say it's always easy. Au contraire, Redis monitoring can prove challenging for a variety of reasons:

  • Distributed architecture: Redis is a complex system with multiple components. As a result, there is no single set of data you can collect to monitor Redis instances effectively. Instead, you need to collect multiple memory metrics, CPU metrics, and other data from across all of your Redis nodes, while also tracking cluster-wide metrics (like connected user count).
  • Multiple points of failure: When something goes wrong in Redis, there are typically multiple potential causes. For example, high latency could stem from increased system load, lack of available memory, or poorly structured requests, to name just a few possibilities. To monitor and troubleshoot effectively, you need to be able to explore each potential root cause quickly. This requires the ability to correlate and analyze a variety of data points. 
  • Limited built-in monitoring functionality: Redis offers limited built-in features for monitoring. It exposes some metrics, which you can view through the Redis CLI tool. However, you'll need external monitoring tools to collect other types of data, like CPU and memory metrics from servers within a Redis cluster.

Monitoring tools increase system load: Achieving high performance is often the main reason why teams choose to use Redis in the first place. Traditional Redis monitoring tools can make it harder to achieve that goal. Why? Because conventional monitoring tools consume CPU and memory to do their jobs – which means they reduce the CPU and memory available to Redis, which in turn can reduce Redis performance levels.

Filling in the blanks: Getting more from Redis monitoring with eBPF

Redis monitoring becomes a lot easier when you have eBPF on your side. eBPF is a Linux kernel technology that allows you to monitor Redis (and virtually anything else running on a Linux server) in a hyper-efficient way while achieving very deep visibility into what's happening on your system.

With eBPF-based tools, you can get so much more than what basic monitoring can give you. You can build and extract meaningful details and contextualized data about your Redis nodes, track down any and all performance issues, and get an inside look into what’s happening in your cluster, all without writing a single line of code. Instead of working hard to add full tracing and monitoring to every part of your system, you can sit back, relax, and enjoy the benefits of Redis’s high-performance in-memory store without worrying about missing a single issue.

By running an eBPF program on a container/server/client, we can detect network data such as requests, responses, sources, destinations, DNS resolutions, request/response size, and a whole lot more. We can also detect process data such as stack traces or exceptions.

Let's emulate a Redis request and response according to Redis protocol spec.

Redis request response example:

Executed from pid: 1234
Request: “SET userA:age 30 -1”
Response: “+OK\r\n”  (+ in the response indicates a simple string response, OK indicates success)
we also get the same source and destination (“localhost:”6379”)
and the time between the request and response (5ms).
If we parse this in our eBPF program, we can create an event that will look something like this:
PID: 1234
Source: localhost:6379
Destination: localhost:6379
Command: SET
Args: userA:age 30 -1
ResponseStatus: OK
Latency: 5ms

We can then aggregate that data using the sources/destinations/response statuses/commands, and calculate success/error ratios, latency percentiles and so much more.

eBPF helps you to get more context - it allows you to run code in the Linux kernel on your Redis nodes to get low-level data about node performance and activity. Monitoring Redis clients using eBPF gives you the ability to transform every request into contextual data, and, with the right tools, into throughput and hit rate metrics, without compromising performance and without any code changes.

That means that you can, for example, track client requests to get the time, success/error, callers, commands, arguments, and beyond:

Screen capture source: app.groundcover.com 

We can then correlate the caller application’s data and create a span (an event that occurred during a timespan):

Once we have the span, we can enrich our span with the caller’s stack trace and pinpoint the exact call:

Frequently asked questions

How do you visualize Redis data?

A variety of tools are available to visualize Redis data. One popular option is Grafana, which can ingest data from Redis and display it through highly customizable charts and graphs. You can also use tools like Redis Insight, a desktop monitoring tool for Redis. And of course, groundcover offers a range of Redis data visualization features.

How can you check cached data in Redis?

The easiest way to check cached data in Redis is to run the MONITOR command in the Redis CLI. This command displays live feedback about how Redis is processing requests.

Thus, if you monitor a command that you expect to trigger a request to cached data, you'll be able to see whether Redis is actually pulling data from the cache.

What causes Redis CPU usage?

A variety of issues can cause high Redis CPU usage. Common culprits include:

  • A poorly designed cluster architecture in which certain nodes handle a majority of requests, leading to a high CPU load for those nodes.
  • Running CPU-intensive commands, like KEYS, MGET, or SMEMBERS. These commands typically consume a lot of CPU because they require Redis to look up a lot of information.
  • Running low on available memory. When this happens, Redis will begin evicting keys to free up memory. Key eviction consumes CPU.

If you've noticed high CPU usage in Redis, consider running the SLOWLOG GET command. This displays queries that are taking a long time to process – which often means that they are also consuming significant CPU. If you notice that certain types of queries are taking a long time, it's likely that they're contributing to a spike in CPU consumption.

When should you not use Redis?

While Redis is a versatile database that supports many use cases, it's not the right fit for every situation. In general, you should not use Redis in the following scenarios:

  • You have very large amounts (hundreds of gigabytes or terabytes) of data to store – so much that providing sufficient memory to handle all of it would be impossible or prohibitively expensive. In this case, using disk storage for your database makes more sense.
  • You plan to store all of your data on a single server. Although Redis could work in this type of use case, you won't be able to benefit from features like sharding, so it may make more sense to stick with a conventional database.
  • You have data that is structured in highly complex ways. For this type of use case, Redis's key-value architecture can make it challenging to structure data in the way you need.
  • Protecting against unexpected data loss is absolutely critical. As we mentioned above, Redis is typically not at major risk of losing data. However, at the end of the day, Redis provides less protection against possible data loss than databases that store information on disk, so Redis is not ideal if data durability is extremely important.

Have any questions we didn't cover here? Reach out on groundcover's Redis channel and ask away!

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.

We care about data. Check out our privacy policy.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.