22 Lecture 1121 Post Lecture Thinking Problems, Answers, and Common Questions

22 Lecture 1121 Post-lecture Thinking Problems, Answers, and Common Questions #

Our course has now been updated to Lecture 21. Today, we will have a Q&A session.

In the first half, I will explain the answers to the exercises after Lectures 11-21. While studying this content, you can compare your answers to see if there is anything you might have missed. Of course, some questions may not have a standard answer, so we can continue to discuss them.

In the second half, I will focus on how to troubleshoot slow query commands and bigkey issues, which many students are concerned about. I will provide detailed explanations in the hope of resolving your confusion.

Alright, let’s begin.

Answers to After-class Exercises #

Lecture 11 #

Question: In addition to the String type and Hash type, what other data type is suitable for storing images mentioned in Lecture 11?

Answer: In addition to String and Hash, we can also use the Sorted Set type for storage. The elements of a Sorted Set have member values and score values, and can be stored using secondary encoding similar to Hash. The specific approach is to use the first 7 digits of the image ID as the key of the Sorted Set, use the last 3 digits of the image ID as the member value, and use the ID of the image storage object as the score value.

When the number of elements in the Sorted Set is small, Redis will use compressed lists for storage, which can save memory space. However, unlike Hash, when inserting data into a Sorted Set, it needs to be sorted according to the score value. When the underlying structure is a compressed list, the insertion performance of Sorted Set is not as good as Hash. Therefore, although Sorted Set type can be used to save in the scenario described in this lecture, it is not the optimal choice.

Lecture 12 #

Question: In Lecture 12, I introduced four typical statistical patterns, namely aggregating statistics, sorting statistics, binary state statistics, and cardinality statistics, and the corresponding suitable set types. Have you encountered other statistical scenarios? What set types did you use?

Answer: @Hailalu mentioned a scenario in the comments: they have used List+Lua to calculate the engagement rate of the most recent 200 customers. The specific approach is that each element in the List represents a customer, and the value of the element is 0, which indicates engaged; the value of the element is 1, which indicates not engaged. When conducting the statistics, the application writes the elements representing the customers into the queue. When calculating the engagement rate, LRANGE key 0 -1 is used to retrieve all the elements, and the ratio of 0 to the total number of elements represents the engagement rate.

This example needs to retrieve all the elements, but the data size is only 200, which is not large, so using List is acceptable in practical applications. However, if the data size is large and there are other query requirements (such as querying the engagement status of individual elements), the operation complexity of List is high and not suitable, and Hash type can be considered.

Lecture 13 #

Question: Have you used other data types of Redis in your daily practice?

Answer: In addition to the 5 basic data types and HyperLogLog, Bitmap, and GEO introduced in our course, Redis also has another data type called Bloom Filter. It has high query efficiency and is often used in caching scenarios to determine whether a piece of data exists in the cache. I will introduce it specifically in Lecture 25.

Lecture 14 #

Question: When using Sorted Set to store time series data, is there any potential risk if we use timestamps as scores and actual data as members? In addition, if you were the developer and maintainer of Redis, would you design aggregation calculations as an inherent feature of Sorted Set?

Answer: Similar to Set, Sorted Set also deduplicates the elements in the set, which means that if the member value we insert into the set is the same as the already existing member value, the score of the original member will be overwritten by the score of the newly written member. The same member values will only be kept once in the Sorted Set. For time series data, this deduplication feature comes with the risk of data loss. After all, multiple time series data values within a certain time period may be the same. If we write data into a Sorted Set that is generated at different times but has different writing times, the Sorted Set will only keep one record with the latest writing time. As a result, data from other time periods will not be saved.

For example, when recording the temperature of an IoT device, the temperature values for a device may be 26 degrees Celsius during a morning. In the Sorted Set, we use the temperature value as the member and the timestamp as the score. We use the ZADD command to write temperature values from different times during the morning into the Sorted Set. Since the member values are the same, only the score will be updated to the latest timestamp, resulting in only one temperature value under the latest timestamp (e.g. 12:00 pm). This certainly does not meet the requirement to save data from multiple time periods.

Regarding whether to implement aggregation calculations as an inherent feature of Sorted Set, considering that Redis’s read-write functionality is executed by a single thread and consumes a significant amount of CPU resources during data read-write, implementing aggregation calculations in Sorted Set would further increase CPU resource consumption and affect normal data reading in Redis. Therefore, if I were a Redis developer and maintainer, I would not implement aggregation calculations as an inherent feature of Redis unless modifications were made to Redis’s thread model, such as using an additional thread pool in Redis for aggregation calculations.

Lecture 15 #

Question: If a message sent by a producer to a message queue needs to be read and processed by multiple consumers (e.g. a message is data collected from a business system and needs to be read and used for real-time calculation by consumer 1, and also needs to be read and stored in the distributed file system HDFS by consumer 2 for future historical queries), what Redis data type would you use to solve this problem?

Answer: Some students mentioned that the consumer group in the Streams data type can be used to consume producer data simultaneously, and that is correct. However, there is one thing to note. If only one consumer group is used, multiple consumers in the consumer group will be mutually exclusive when consuming messages. In other words, within a consumer group, a message can only be consumed by one consumer. We want the message to be read by both consumer 1 and consumer 2, which is a requirement for multiple consumers. Therefore, if the consumer group mode is used, consumer 1 and consumer 2 need to belong to different consumer groups so that they can consume messages simultaneously.

In addition, Redis implements the publish-subscribe feature based on dictionary and linked-list data structures, which can be used to achieve multiple consumer consumption of a message, thus meeting the requirements of the scenario mentioned in the question.

Lecture 16 #

Question: Are Redis write operations (e.g. SET, HSET, SADD, etc.) on the critical path?

Answer: Redis itself is an in-memory database, so write operations need to be completed in memory before returning. This means that if these write operations deal with a large data set, for example, 10,000 data points, the main thread needs to wait for all 10,000 data points to be written before continuing to execute subsequent commands. Therefore, Redis write operations are also on the critical path.

This question is asking you to differentiate between write operations targeting memory and write operations targeting disks. When a write operation needs to write data to a disk, generally, it only needs to write the data to the kernel buffer of the operating system. However, if we perform synchronous write operations, we must wait for the data to be written back to the disk. Therefore, write operations targeting disks generally would not be on the critical path.

I saw some students mention that whether a write operation is on the critical path can be determined based on the return value of the write command. If the return value is OK or if the client does not care about whether the write is successful, then the write operation is not on the critical path.

This idea is good, but it is important to note that the client often blocks and waits for the return result of the command to be sent. The client waits until the previous command returns, and then it will send the next command. Even in scenarios where we do not care about the return result of the write operation, the client still needs to wait for the write operation to complete. Therefore, in scenarios where we do not care about the return result of the write operation, we can make asynchronous modifications to the Redis client. Specifically, we can use asynchronous threads to send these commands that do not care about the return result, instead of waiting for the results within the Redis client.

Lecture 17 #

Question: On a server with two CPU sockets (each socket has 8 physical cores), we deployed a Redis sharding cluster with 8 instances (all 8 instances are master nodes with no master-slave relationship). Now we have two options:

  1. Run 8 instances on the same CPU socket and bind them with 8 CPU cores;
  2. Run 4 instances on each of the two CPU sockets and bind them to the corresponding socket.

If you don’t consider the impact of network data reading, which solution would you choose?

Answer: It is recommended to use the second solution for two main reasons.

  1. Processes on the same CPU socket share L3 cache. If all 8 instances are deployed on the same socket, they will compete for the L3 cache, which will result in a lower L3 cache hit rate and affect performance.
  2. Processes on the same CPU socket use the same memory space on the socket. If 8 instances share the same memory space on the socket, there will be competition for memory resources. If some instances have a large amount of data to store, the other instances may not have enough memory space. In this case, the other instances will have to request memory across sockets, resulting in cross-socket memory access and performance degradation of the instances.

In addition, in a sharded cluster, different instances communicate and migrate data through the network, and do not use shared memory for cross-instance data access. Therefore, even if different instances are deployed on different sockets, they will not have cross-socket memory access and will not be affected by cross-socket memory access.

Lecture 18 #

Question: In Redis, besides the KEYS command, what other commands can be used to achieve fuzzy querying of keys in key-value pairs? Do these commands slow down Redis?

Answer: Redis provides the SCAN command, as well as commands like SSCAN and HSCAN for set type data, which can return a specified number of data based on the set quantity parameter set during execution. This avoids returning all matching data at once like the KEYS command, so it does not slow down Redis. Take HSCAN as an example, we can execute the following command to return 100 key-value pairs with keys starting with 103 from the user hash set.

HSCAN user 0 match “103*” 100

Lecture 19 #

Question: Have you encountered a situation where Redis becomes slow? If so, how did you solve it?

Answer: @Kaito shared his Checklist for investigating slow Redis issues in the comments section, and even provided solutions. It’s great, I summarized and improved the potential causes of slow Redis mentioned by Kaito and shared them with you:

  1. Using commands with high complexity or querying all data at once;
  2. Working with bigkeys;
  3. A large number of keys expiring at the same time;
  4. Memory reaching maxmemory;
  5. Clients using short connections with Redis;
  6. When the data volume of the Redis instance is large, whether it’s generating RDB or rewriting AOF, it can cause significant fork delay;
  7. AOF write policy is set to always, causing every operation to be synchronously written back to disk;
  8. Insufficient memory on the machine running the Redis instance, leading to swap usage, and Redis needs to read data from swap partition;
  9. Unreasonable CPU binding for processes;
  10. Transparent huge pages mechanism is enabled on the machine running the Redis instance;
  11. Network card overload.

Lecture 20 #

Question: We can use mem_fragmentation_ratio to determine if Redis has a severe memory fragmentation ratio. The thresholds I provided are all greater than 1. I would like you to consider, what does it mean if mem_fragmentation_ratio is less than 1 for Redis’ memory usage? What impact does it have on Redis’ performance and memory utilization?

Answer: If mem_fragmentation_ratio is less than 1, it means that the amount of memory allocated by the operating system to Redis is smaller than the requested size. In this case, the memory on the server running the Redis instance is not enough, and swapping may have occurred. As a result, the read and write performance of Redis will be affected because the Redis instance needs to read and write data from the swap partition on disk, which is slower.

Lecture 21 #

Question: When interacting with a Redis instance, should the client used in the application use a buffer? If so, does it affect the performance and memory usage of Redis?

Answer: The Redis client used in the application needs to temporarily store the requests to be sent in a buffer. There are two benefits to this.

On one hand, it allows the client to control the rate of sending requests, avoiding sending too many requests to the Redis instance at once, which can cause performance degradation due to high pressure. However, the client buffer will not be too large, so it does not have much impact on the memory usage of Redis.

On the other hand, when using Redis master-slave clusters, there is a certain amount of time needed for failover between the master and slave nodes. During this time, the master node cannot serve incoming requests. If the client has a buffer to temporarily store requests, the client can still receive requests from the application, avoiding directly returning an error indicating that the service is unavailable.

Representative Questions #

In the previous lessons, I focused on methods to avoid Redis becoming slow. The execution time of slow query commands and the time-consuming operations of big keys can cause Redis to be blocked. After learning these methods, many students understand the importance of avoiding Redis blocking, but they are not clear about how to troubleshoot the blocking commands and big keys specifically.

Therefore, next, I will explain in detail how to troubleshoot slow query commands and big keys.

Question 1: How to use the slow query log and latency monitor to troubleshoot slow operations?

In Lesson 18, I mentioned that we can use the Redis log (slow query log) and latency monitor to troubleshoot slow command operations. So, how can we use the slow query log and latency monitor?

Redis’s slow query log records command operations that exceed a certain threshold of execution time. When we notice that Redis responses become slow and request latency increases, we can search in the slow query log to determine which commands have long execution times.

Before using the slow query log, we need to set two parameters.

  • slowlog-log-slower-than: This parameter indicates the threshold value for recording command operations with execution times greater than how many microseconds.
  • slowlog-max-len: This parameter indicates the maximum number of command records that the slow query log can store. The underlying implementation of the slow query log is a fixed-size first-in-first-out queue. Once the number of recorded commands exceeds the length of the queue, the earliest recorded command operation will be deleted. By default, this value is 128. However, if there are many slow query commands, the log will not be able to store them all; if this value is too large, it will take up a certain amount of memory space. Therefore, it is generally recommended to set it to around 1000. This way, more slow query commands can be recorded for easy troubleshooting, and memory consumption can be avoided.

After setting the parameters, the slow query log will record command operations whose execution time exceeds the slowlog-log-slower-than threshold.

We can use the SLOWLOG GET command to view the recorded command operations in the slow query log. For example, by executing the following command, we can view the most recent slow query log information.

SLOWLOG GET 1
1) 1) (integer) 33           // The unique ID number of each log
   2) (integer) 1600990583   // The timestamp when the command was executed
   3) (integer) 20906        // The duration of command execution in microseconds
   4) 1) "keys"               // The specific command and parameters
      2) "abc*"
   5) "127.0.0.1:54793"      // The client's IP and port number
   6) ""                     // The name of the client, which is empty here

As shown above, the execution time of the command KEYS “abc*” is 20906 microseconds, approximately 20 milliseconds. It is indeed a slow command operation. If we want to view more slow logs, we can simply change the number parameter after SLOWLOG GET to the desired number of log entries.

After obtaining the slow query log, we can quickly determine which commands have relatively long execution times, and then provide feedback to the business department, asking the developers to avoid using these commands or reduce the amount of data being operated on to reduce the complexity of command execution.

In addition to the slow query log, starting from version 2.8.13, Redis also provides a latency monitor tool. This tool can be used to monitor the peak latency during Redis operation.

Similar to setting the slow query log, to use the latency monitor, we first need to set the threshold for command execution time. When the actual execution time of a command exceeds this threshold, it will be monitored by the latency monitor. For example, we can set the threshold for monitoring command execution time to 1000 microseconds as follows:

config set latency-monitor-threshold 1000

After setting the parameters for the latency monitor, we can use the latency latest command to view the latest and maximum latency that exceeds the threshold. It is demonstrated as follows:

latency latest
1) 1) "command"
   2) (integer) 1600991500    // The timestamp of command execution
   3) (integer) 2500           // The latest latency that exceeds the threshold
   4) (integer) 10100          // The maximum latency that exceeds the threshold

Question 2: How to troubleshoot big keys in Redis?

When applying Redis, we should try to avoid using big keys because the Redis main thread will be blocked when operating on big keys. So, once big keys are used in the business application, how can we troubleshoot them?

When executing the redis-cli command, Redis can be accompanied by the –bigkeys option to perform statistical analysis of the size of key-value pairs in the entire database. For example, it can count the number and average size of key-value pairs for each data type. In addition, after executing this command, it will output information about the largest big key in each data type. For the String type, the byte length of the largest big key will be output, and for the set type, the number of elements in the largest big key will be output, as shown below:

./redis-cli --bigkeys

-------- summary -------
Sampled 32 keys in the keyspace!
Total key length in bytes is 184 (avg len 5.75)

// The biggest bigkey in each data type
Biggest list found 'product1' has 8 items
Biggest hash found 'dtemp' has 5 fields
Biggest string found 'page2' has 28 bytes
Biggest stream found 'mqstream' has 4 entries
Biggest set found 'userid' has 5 members
Biggest zset found 'device:temperature' has 6 members

// The total number and percentage of keys for each data type, along with average size
4 lists with 15 items (12.50% of keys, avg size 3.75)
5 hashs with 14 fields (15.62% of keys, avg size 2.80)
10 strings with 68 bytes (31.25% of keys, avg size 6.80)
1 streams with 4 entries (03.12% of keys, avg size 4.00)
7 sets with 19 members (21.88% of keys, avg size 2.71)
5 zsets with 17 members (15.62% of keys, avg size 3.40)

However, when using the –bigkeys option, there is one thing to note. This tool scans the database to find bigkeys, so it will impact the performance of the Redis instance during execution. If you are using a master-slave cluster, I suggest executing this command on a slave node. When executed on a master node, it will block the master node. If you don’t have a slave node, I have two suggestions for you: the first suggestion is to perform the scan query during the low traffic period of the Redis instance to avoid impacting the normal operation; the second suggestion is to use the -i parameter to control the scan interval and avoid reducing the performance of the Redis instance during a long scan. For example, when we execute the following command, redis-cli will pause for 100 milliseconds (0.1 seconds) after every 100 scans.

./redis-cli --bigkeys -i 0.1

Of course, there are two drawbacks to using the built-in –bigkeys option in Redis:

  1. This method can only return the biggest bigkey in each data type, and cannot obtain the top N bigkeys by size.
  2. For set data type, this method only counts the number of elements in the set, not the actual memory usage. However, a set with many elements does not necessarily occupy more memory. It is possible that each element occupies very little memory, so even if there are many elements, the overall memory consumption is not large.

Therefore, if we want to count the top N bigkeys by memory usage in each data type, we can develop a program to do the statistics.

Here’s a basic development idea: use the SCAN command to scan the database, and then use the TYPE command to get the type of each returned key. Next, for string type, we can directly use the STRLEN command to get the length of the string, which is the number of bytes occupied in memory.

For set data type, there are two methods to obtain the memory size.

If you can know the average size of elements in the set in advance from the business layer, you can use the following commands to get the number of elements in the set and then multiply it by the average size to get the memory usage of the set:

  • List type: LLEN command;
  • Hash type: HLEN command;
  • Set type: SCARD command;
  • Sorted Set type: ZCARD command;

If you cannot know the size of elements written into the set in advance, you can use the MEMORY USAGE command (requires Redis 4.0 and above) to query the memory space occupied by a key-value pair. For example, executing the following command can get the memory space occupied by the key ‘user:info’ of a set data type.

MEMORY USAGE user:info
(integer) 315663239

This way, you can calculate the top N bigkeys in terms of memory usage in each data type in the program you develop, which is equivalent to the top N bigkeys in each data type.

Summary #

From Lecture 11 to Lecture 21, we have covered a lot of key points, and they are quite detailed. In fact, we can divide them into two major parts: a variety of data structures, and how to avoid Redis performance degradation.

I hope that this Q&A session can help you deepen your understanding of the previous content. Through this session, you should also see that the post-lecture questions are a good way to summarize the key points and expand your thinking. So, in the upcoming courses, I hope you can leave more comments to discuss your ideas. This will further consolidate your knowledge and gain even more by exchanging ideas with other classmates. Alright, that’s all for this lecture. See you in the next one.