Extra Meal 06 Usage Guidelines and Recommendations for Redis

Extra Meal 06 Usage Guidelines and Recommendations for Redis #

In today’s snack, let’s talk about a more relaxed topic. I will introduce the guidelines for using Redis, including the usage of key-value pairs, storing business data, and command usage.

After all, high performance and memory savings are our two goals. Only by using Redis in a standardized manner can we truly achieve these two goals. If the previous content taught you how to use Redis, then today’s content aims to help you use Redis effectively and avoid errors as much as possible.

Alright, without further ado, let’s take a look at the guidelines for using key-value pairs.

Guidelines for using key-value pairs #

Regarding the guidelines for using key-value pairs, I would like to discuss two aspects:

Naming conventions for keys: Only by following naming conventions can we provide keys that are readable and maintainable, facilitating daily management.
Guidelines for designing values: This includes avoiding bigkeys, selecting efficient serialization and compression methods, and using integer object pools and appropriate data types.

Guidelines 1: Naming conventions for keys #

By default, a Redis instance can support 16 databases, and we can store different business data in different databases. However, when using different databases, the client needs to switch databases using the SELECT command, which adds an extra operation.

In fact, we can reduce this operation by naming the keys properly. The specific approach is to use the business name as a prefix, followed by a colon separator, and then the specific business data name. This way, we can differentiate different business data based on the key prefix and avoid switching between multiple databases.

Let me give you a simple example to see how keys can be named.

For example, if we want to track the unique visitor count for a web page, we can set the key as follows. This represents that this data corresponds to the business of tracking unique visitors and the page number is 1024.

uv:page:1024

There is one thing to note here. The key itself is a string, and the underlying data structure is SDS (Simple Dynamic String). The SDS structure includes metadata information such as string length and allocated space size. Starting from Redis 3.2, when the length of the key string increases, the metadata in SDS will also occupy more memory.

Therefore, when setting the key name, we need to be mindful of controlling the length of the key. Otherwise, if the key is very long, it will consume more memory space, and the SDS metadata will also consume additional memory space.

The correspondence between string length in the SDS structure and metadata size is shown in the following table:

To reduce the memory space occupied by keys, I have a suggestion for you: You can use the initials of the corresponding English words for the business name or business data name (e.g., use “u” for “user” or “m” for “message”), or you can use abbreviations (such as “uv” for “unique visitor”).

Guidelines 2: Avoid using bigkeys #

Redis uses a single thread to read and write data, and read/write operations on bigkeys can block the thread and reduce Redis’ processing efficiency. Therefore, when using Redis, it is important to follow guidelines for value design, specifically avoiding bigkeys.

There are usually two scenarios where bigkeys occur.

Scenario 1: The size of the value in the key-value pair is inherently large, such as a String-type data with a value of 1MB. To avoid String-type bigkeys, at the application layer, we should try to control the size of String-type data to be below 10KB.
- Case 2: The value of the key-value pair is a collection with a large number of elements, such as a Hash collection with 1 million elements. To avoid bigkey for collection types, my design recommendation to you is to try to control the number of elements in the collection to be less than 10,000.

Of course, these recommendations are only in order to avoid bigkey as much as possible. If the String type data at the application layer is indeed very large, we can also reduce the data size through data compression. If there are indeed many elements in the collection, we can split a large collection into multiple smaller ones to save.

Here, there is one more thing to note. Redis has 4 collection types: List, Hash, Set, and Sorted Set. When the number of elements in a collection is less than a certain threshold, it will use a memory-compact underlying data structure to save memory. For example, assuming the hash-max-ziplist-entries configuration item for a Hash collection is 1000, if the number of elements in the Hash collection does not exceed 1000, the data will be saved using ziplist.

Although compact data structures can save memory, they will also reduce the read and write performance to some extent. So, if the business application requires high-performance access rather than memory saving, you don’t need to deliberately control the number of collection elements as long as it doesn’t cause bigkey.

Rule 3: Use efficient serialization and compression methods #

In order to save memory, besides using compact data structures, we can also follow two usage rules, namely, using efficient serialization and compression methods to reduce the size of the value.

In Redis, strings are stored as binary-safe byte arrays. Therefore, we can serialize the business data into binary data and write it to Redis.

However, different serialization methods have different effects on serialization speed and the memory space occupied after serialization. For example, protostuff and kryo are more efficient serialization methods compared to the built-in Java serialization method (java-build-in-serializer).

In addition, sometimes the business application may use XML and JSON formats saved as strings.

The advantage of doing this is that these two formats have good readability, making debugging easier, and they are supported by different programming languages.

The drawback is that XML and JSON format data occupy relatively large memory space. To avoid occupying too much memory space, I recommend using compression tools (such as snappy or gzip) to compress the data before writing it to Redis, which can save memory space.

Rule 4: Use the integer object shared pool #

Integers are commonly used data types. Redis internally maintains 10,000 integer objects from 0 to 9999 and uses them as a shared pool.

In other words, if a key-value pair contains an integer within the range of 0 to 9999, Redis will not create a special integer object for this key-value pair but will reuse the integer object in the shared pool.

In this way, even if a large number of key-value pairs save integers within the range of 0 to 9999, only one copy of the integer object is stored in the Redis instance, which can save memory space.

Based on this feature, I suggest that you, under the premise of meeting the business data requirements, try to use integers as much as possible, as it can save instance memory.

When can’t the integer object shared pool be used? There are mainly two situations.

The first situation is if maxmemory is set in Redis and LRU eviction policy is enabled (either allkeys-lru or volatile-lru policy), then the integer object shared pool cannot be used. This is because the LRU policy needs to calculate the usage time of each key-value pair, and if different key-value pairs share the same integer object, the LRU policy cannot perform statistics.

The second situation is when the collection type data uses ziplist encoding and the collection elements are integers. In this case, the shared pool cannot be used because ziplist uses a compact memory structure, and determining the sharing of integer objects is inefficient.

Alright, up to this point, we have learned about the four rules related to the use of key-value pairs. By following these four rules, the most direct benefit is saving memory space. Next, we will learn about the rules to follow when actually storing data.

Data Storage Guidelines #

Guideline 1: Use Redis to Store Hot Data #

In order to provide high-performance access, Redis stores all data in memory.

Although Redis supports using RDB snapshots and AOF log persistence to save data, these mechanisms are designed to provide data reliability guarantee, not to expand data capacity. Moreover, the cost of memory itself is relatively high. If we store all business data in Redis, it will impose a significant memory cost burden.

Therefore, in practical applications of Redis, we usually use it as a cache to store hot data. This not only fully utilizes Redis’ high-performance features, but also allocates valuable memory resources to hot data in the service, as the saying goes, “use the best material where it is needed the most.”

Guideline 2: Store Different Business Data in Separate Instances #

Although we can use prefixes of keys to distinguish data from different businesses, if the data volume of all businesses is large and their access characteristics are different, storing these data in the same instance will result in mutual interference of data operations.

You can imagine a scenario: if the data collection business uses Redis to save data mainly through write operations, and the user statistics business mainly uses Redis for read queries, if these two businesses’ data are saved together, the read and write operations will interfere with each other, which will surely cause a slow business response.

Therefore, I suggest you place different business data in different Redis instances. This way, you can avoid excessive memory usage of a single instance and prevent interference between operations of different businesses.

Guideline 3: Set Expiration Time when Saving Data #

For Redis, memory is a very valuable resource, and Redis is usually used to store hot data. Hot data generally has a time limit for usage.

Therefore, when saving data, I recommend you set an expiration time based on the duration of the data’s usage in the business. Otherwise, the data written to Redis will continue occupying memory. If the data continues to increase, it may reach the memory limit of the machine, causing a memory overflow and service crash.

Guideline 4: Control the Capacity of Redis Instances #

The memory size of a single Redis instance should not be too large. Based on my own experience, I recommend setting it between 2 to 6GB. This way, whether it is RDB snapshots or data synchronization in master-slave clusters, they can be completed quickly without blocking the processing of normal requests.

Command Usage Guidelines #

Finally, let’s take a look at the guidelines for using Redis commands.

Guideline 1: Disable Certain Commands in Production #

Redis is a single-threaded system that processes requests sequentially. If we execute commands that involve intensive and time-consuming operations, it can heavily block the main thread and prevent other requests from being processed properly. There are three main types of commands to avoid:

KEYS - matches key-value pairs based on the key content and returns the matched pairs. This command requires a full table scan of the global hash table in Redis, which severely blocks the main thread.
FLUSHALL - deletes all data in the Redis instance. If the data volume is large, it can seriously block the main thread.
FLUSHDB - deletes all data in the current database. If the data volume is large, it will also block the Redis main thread.

Therefore, when using Redis in a production application, it is necessary to disable these commands. The specific approach is for the administrator to use the rename-command command in the configuration file to rename these commands, making them unavailable to the client.

Of course, you can also use other commands as alternatives to these three.

For the KEYS command, you can use the SCAN command instead to return the matched key-value pairs in batches, avoiding blocking the main thread.
For the FLUSHALL and FLUSHDB commands, you can add the ASYNC option to make these commands delete data asynchronously using a background thread, avoiding blocking the main thread.

Guideline 2: Be Cautious with the MONITOR Command #

After executing the MONITOR command in Redis, it continuously outputs the monitored commands. Therefore, we usually use the results returned by the MONITOR command to check the execution status of commands.

However, the MONITOR command continuously writes the monitored content to the output buffer. If there are a lot of commands being executed online, the output buffer will quickly overflow, which can affect the performance of Redis and even cause service crashes.

Therefore, unless it is absolutely necessary to monitor the execution of certain commands (for example, when Redis performance suddenly becomes slow and we want to check which commands the client executed), you can occasionally use the MONITOR command for a short period of time. Otherwise, I suggest not using the MONITOR command.

Guideline 3: Be Cautious with Full Scan Commands #

For collection-type data, if you want to obtain all elements in a collection, it is generally not recommended to use full scan commands (such as HGETALL for hash types and SMEMBERS for set types). These operations perform a full scan on the underlying data structure of hash and set types, causing the Redis main thread to be blocked if there are a large number of collection-type data.

If you want to obtain the full data of collection types, I have three suggestions for you.

The first suggestion is to use the SSCAN and HSCAN commands to return data in batches from the collection, reducing the blocking of the main thread.
The second suggestion is to divide and conquer by splitting a large hash collection into multiple small hash collections. This operation corresponds to the business layer, where you split the business data based on attributes such as time, location, and user ID, turning a large collection into multiple small collections. For example, when you are analyzing user access patterns, you can create a hash collection for each day’s data based on time.
The last suggestion is, if the collection type stores multiple attributes of business data, and each query needs to return these attributes, you can use the String type to store and serialize these attributes, and simply return the String data without performing a full scan on the collection type.

Summary #

In this lesson, I have introduced 11 specifications regarding the high-performance access and memory saving in Redis application. I have categorized these specifications into three types: mandatory, recommended, and advisory, as shown in the table below:

Let me explain these three types of specifications:

Mandatory specifications: Failure to comply with these specifications will have a significant negative impact on Redis application, such as degraded performance.
Recommended specifications: The content of these specifications can effectively improve performance, save memory space, or increase the convenience of development and operation. You can apply them directly in practice.
Advisory specifications: These specifications are related to practical business applications. I am only giving you suggestions based on my experience. You need to consider and use them in conjunction with your own business scenarios.

Let me add one more thing, you must master these usage specifications proficiently and truly apply them in your Redis usage scenarios to increase the efficiency of Redis.

One Question for Each Lesson #

As usual, I have a small question for you: Do you follow any good usage guidelines when working with Redis in your daily applications?

Feel free to share your commonly used guidelines in the comments section. Let’s discuss and exchange ideas together. If you found today’s content helpful, please feel free to share it with your friends or colleagues. See you in the next lesson.