29 Case Study Solving the Issue of Severe Delay in Redis Response #
Hello, I’m Ni Pengfei.
In the previous section, we analyzed a case of product search based on MySQL. Let’s review it.
When accessing the product search interface, we found that the response was particularly slow. Through analysis of system resources such as CPU, memory, and disk I/O, we found that there was a disk I/O bottleneck at that time, and it was caused by the case application.
Next, using pidstat, we found that the culprit was the mysqld process. We then used strace and lsof to find out the files that mysqld was reading. Based on the file names and paths, we found the database and table that mysqld was operating on. Based on this information, we guessed that it was a slow query problem caused by the lack of an index.
To verify this guess, we went to the MySQL command line terminal and used a database analysis tool to find that the fields accessed by the case application indeed lacked an index. Since the guess was correct, the problem naturally resolved after adding the index.
From this case, you can see that the MyISAM engine of MySQL relies mainly on system cache to accelerate disk I/O access. However, if there are other applications running in the system, the MyISAM engine is difficult to fully utilize the system cache. The cache may be occupied by other applications or even cleared.
Therefore, I generally do not recommend relying solely on system cache for performance optimization of applications. It is preferable to allocate memory within the application and build a fully self-controlled cache, or use third-party caching applications like Memcached, Redis, etc.
Redis is one of the most commonly used key-value storage systems, often used as a database, high-speed cache, message queue proxy, etc. Redis stores data in memory. However, in order to ensure that data is not lost in the event of server failure, in many cases, we need to configure persistence for it, which may cause performance issues with disk I/O.
Today, I will analyze a case of using Redis as a cache with you. This is also an application based on Python Flask, which provides an interface for querying the cache. However, the response time of the interface is relatively long and cannot meet the requirements of the online system.
I would like to thank Dong Guoxing, a senior backend engineer at Ctrip Systems Development Department, for providing today’s case.
Case Preparation #
This case is still based on Ubuntu 18.04, which is also applicable to other Linux systems. The environment I used for this case is as follows:
-
Machine configuration: 2 CPUs, 8GB memory
-
Pre-installed tools: docker, sysstat, git, make, etc. You can install them by running
apt install docker.io sysstat
.
Today’s case consists of two parts: a Python application and Redis. The Python application is based on Flask and uses Redis to manage application caching. It provides three HTTP interfaces:
-
/: Returns “hello redis”.
-
/init/: Inserts a specified number of cache data. If the number is not specified, the default is 5000.
-
Cache keys are in the format of uuid.
-
Cache values can be good, bad, or normal.
-
/get_cache/: Queries the cache data of a specified value and returns the processing time. The type_name parameter only supports good, bad, and normal (i.e., it finds the list of keys with the same value).
Due to the number of applications, I have packaged them into two Docker images and pushed them to Github. This way, you only need to run a few commands to start.
For this case, you will need two virtual machines. One of them will be used as the target machine for the case analysis, running the Flask application with the IP address 192.168.0.10. The other one will act as the client, making requests to the cache query interface. I have created a diagram to illustrate their relationship.
Next, open two terminals and SSH into these two virtual machines respectively. Install the tools mentioned earlier on the first virtual machine.
As usual, all commands in the case are assumed to be run as the root user. If you are logged into the system as a regular user, run the command sudo su root
to switch to the root user.
With that, the preparation work is complete. Next, we will proceed to the operation.
Case Study #
First, in the first terminal, execute the following command to run the target application for this case study. Normally, you should see the following output:
# Note that the random string below is the container ID, which will be different each time you run it, and you don't need to pay attention to it
$ docker run --name=redis -itd -p 10000:80 feisky/redis-server
ec41cb9e4dd5cb7079e1d9f72b7cee7de67278dbd3bd0956b4c0846bff211803
$ docker run --name=app --network=container:redis -itd feisky/redis-app
2c54eb252d0552448320d9155a2618b799a1e71d7289ec7277a61e72a9de5fd0
Then, run the docker ps
command to verify that both containers are running (Up
status):
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2c54eb252d05 feisky/redis-app "python /app.py" 48 seconds ago Up 47 seconds app
ec41cb9e4dd5 feisky/redis-server "docker-entrypoint.s…" 49 seconds ago Up 48 seconds 6379/tcp, 0.0.0.0:10000->80/tcp redis
Today’s application is listening on port 10000, so you can access the three mentioned interfaces through http://192.168.0.10:10000.
For example, switch to the second terminal and use the curl
tool to access the application homepage. If you see the output hello redis
, it means the application starts normally:
$ curl http://192.168.0.10:10000/
hello redis
Next, continue in the second terminal and execute the curl
command below to call the /init
interface of the application to initialize the Redis cache and insert 5000 cache entries. This process may be slow. For example, it took me more than ten minutes on my machine. After waiting patiently for a while, you will see the following output:
# Insert 5000 entries in the case. It can be adjusted appropriately according to the disk type during practice, such as increasing it when using SSD and decreasing it when using HDD.
$ curl http://192.168.0.10:10000/init/5000
{"elapsed_seconds":30.26814079284668,"keys_initialized":5000}
Continue with the next command to access the cache query interface of the application. If everything is normal, you will see the following output:
$ curl http://192.168.0.10:10000/get_cache
{"count":1677,"data":["d97662fa-06ac-11e9-92c7-0242ac110002",...],"elapsed_seconds":10.545469760894775,"type":"good"}
We can see that this interface call takes 10 seconds! Such a long response time obviously does not meet the requirements of the actual application.
What could be the problem? We still need to use the performance tools and principles we learned earlier to find this bottleneck.
But don’t worry, in order to avoid the client request ending during the analysis process, we need to put the curl
command in a loop before performing performance analysis. You can continue executing the command below in the second terminal:
$ while true; do curl http://192.168.0.10:10000/get_cache; done
Next, go back to the first terminal and look for the cause of the slow response.
In the recent few cases, the phenomenon was slow response, so naturally we would first suspect that there might be a bottleneck in system resources. So, observing the usage of CPU, memory, and disk I/O is certainly not wrong.
First, in the first terminal, execute the top
command to analyze the CPU usage of the system:
$ top
top - 12:46:18 up 11 days, 8:49, 1 user, load average: 1.36, 1.36, 1.04
Tasks: 137 total, 1 running, 79 sleeping, 0 stopped, 0 zombie
%Cpu0 : 6.0 us, 2.7 sy, 0.0 ni, 5.7 id, 84.7 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu1 : 1.0 us, 3.0 sy, 0.0 ni, 94.7 id, 0.0 wa, 0.0 hi, 1.3 si, 0.0 st
KiB Mem : 8169300 total, 7342244 free, 432912 used, 394144 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 7478748 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9181 root 20 0 193004 27304 8716 S 8.6 0.3 0:07.15 python
9085 systemd+ 20 0 28352 9760 1860 D 5.0 0.1 0:04.34 redis-server
368 root 20 0 0 0 0 D 1.0 0.0 0:33.88 jbd2/sda1-8
149 root 0 -20 0 0 0 I 0.3 0.0 0:10.63 kworker/0:1H
1549 root 20 0 236716 24576 9864 S 0.3 0.3 91:37.30 python3
Observing the output of top, it can be seen that the iowait of CPU0 is relatively high, reaching 84%; and the CPU usage of each process is not very high, with the highest being python and redis-server at only 8% and 5% respectively. Looking at the memory, the total memory is 8GB, and there is still more than 7GB of free memory, so there doesn’t seem to be any issues with memory.
Taking into account the information from top, the most suspicious culprit is iowait. Therefore, the next step is to continue analyzing whether it is an I/O problem.
In the first terminal, press Ctrl+C to stop the top command; then, execute the following iostat command to check for any I/O performance issues:
$ iostat -d -x 1
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
...
sda 0.00 492.00 0.00 2672.00 0.00 176.00 0.00 26.35 0.00 1.76 0.00 0.00 5.43 0.00 0.00
Observing the output of iostat, we find that the disk sda has a write rate (wkB/s) of 2.5MB per second, and the I/O utilization (%util) is 0. It seems that although there are some I/O operations, they have not caused an I/O bottleneck on the disk.
After investigating, it turns out that there are no issues with CPU and memory usage, and there is no I/O bottleneck either. It seems like there are no further analysis directions?
In situations like this, it’s always helpful to reflect and consider if any useful clues have been overlooked. You can think for yourself for a moment and focus on three directions: the analyzed application case, system principles, and performance tools. Recall their characteristics, look for abnormal phenomena, and continue to move forward.
Recalling the case problem for today, the issue is slow data querying from the Redis cache. For querying, the corresponding I/O should be disk reads, but what we just saw with iostat was write operations. Although the I/O itself does not have a performance bottleneck, the disk writes here are quite strange. Why would there be disk writes? To find out which process is writing to the disk, we need to rely on our old friend, pidstat. Execute the following pidstat command in terminal one to observe the I/O situation of the processes:
$ pidstat -d 1
12:49:35 UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command
12:49:36 0 368 0.00 16.00 0.00 86 jbd2/sda1-8
12:49:36 100 9085 0.00 636.00 0.00 1 redis-server
From the output of pidstat, we can see that the process with the most I/O is redis-server with PID 9085, and it is indeed writing to the disk. This indicates that it is indeed redis-server performing disk writes.
Of course, just finding the process that reads and writes to the disk is not enough. We still need to use the combination of strace and lsof to see what redis-server is actually writing.
Next, still in terminal one, execute the strace command and specify the process number 9085 of redis-server:
-f tracks child processes and threads, -T displays the duration of system calls, and -tt displays the trace time #
$ strace -f -T -tt -p 9085 [pid 9085] 14:20:16.826131 epoll_pwait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 65, null, 8) = 1 <0.000055> [pid 9085] 14:20:16.826301 read(8, “*2\r\n$3\r\nGET\r\n$41\r\nuuid:5b2e76cc-”…, 16384) = 61 <0.000071> [pid 9085] 14:20:16.826477 read(3, 0x7fff366a5747, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000063> [pid 9085] 14:20:16.826645 write(8, “$3\r\nbad\r\n”, 9) = 9 <0.000173> [pid 9085] 14:20:16.826907 epoll_pwait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 65, null, 8) = 1 <0.000032> [pid 9085] 14:20:16.827030 read(8, “*2\r\n$3\r\nGET\r\n$41\r\nuuid:55862ada-”…, 16384) = 61 <0.000044> [pid 9085] 14:20:16.827149 read(3, 0x7fff366a5747, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000043> [pid 9085] 14:20:16.827285 write(8, “$3\r\nbad\r\n”, 9) = 9 <0.000141> [pid 9085] 14:20:16.827514 epoll_pwait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 64, null, 8) = 1 <0.000049> [pid 9085] 14:20:16.827641 read(8, “*2\r\n$3\r\nGET\r\n$41\r\nuuid:53522908-”…, 16384) = 61 <0.000043> [pid 9085] 14:20:16.827784 read(3, 0x7fff366a5747, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000034> [pid 9085] 14:20:16.827945 write(8, “$4\r\ngood\r\n”, 10) = 10 <0.000288> [pid 9085] 14:20:16.828339 epoll_pwait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 63, null, 8) = 1 <0.000057> [pid 9085] 14:20:16.828486 read(8, “*3\r\n$4\r\nSADD\r\n$4\r\ngood\r\n$36\r\n535”…, 16384) = 67 <0.000040> [pid 9085] 14:20:16.828623 read(3, 0x7fff366a5747, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000052> [pid 9085] 14:20:16.828760 write(7, “*3\r\n$4\r\nSADD\r\n$4\r\ngood\r\n$36\r\n535”…, 67) = 67 <0.000060> [pid 9085] 14:20:16.828970 fdatasync(7) = 0 <0.005415> [pid 9085] 14:20:16.834493 write(8, “:1\r\n”, 4) = 4 <0.000250>
After observing for a while, did you notice anything interesting?
In fact, from the system calls, it can be seen that epoll_pwait, read, write, and fdatasync system calls are quite frequent. Therefore, the disk writes observed earlier should be caused by write or fdatasync.
Next, let’s run the lsof command to find out the targets of these system calls:
$ lsof -p 9085 redis-ser 9085 systemd-network 3r FIFO 0,12 0t0 15447970 pipe redis-ser 9085 systemd-network 4w FIFO 0,12 0t0 15447970 pipe redis-ser 9085 systemd-network 5u a_inode 0,13 0 10179 [eventpoll] redis-ser 9085 systemd-network 6u sock 0,9 0t0 15447972 protocol: TCP redis-ser 9085 systemd-network 7w REG 8,1 8830146 2838532 /data/appendonly.aof redis-ser 9085 systemd-network 8u sock 0,9 0t0 15448709 protocol: TCP Now you can see that descriptor number 3 is a pipe, number 5 is an eventpoll, number 7 is a regular file, and number 8 is a TCP socket.
Combining this with the phenomenon of disk writes, we know that only the regular file with number 7 will generate disk writes, and the file path it operates on is /data/appendonly.aof
. The corresponding system calls include write
and fdatasync
.
If you are familiar with Redis’ persistence configuration, when you see this file path and the fdatasync
system calls, you should be able to realize that they correspond to the appendonly
and appendfsync
options in Redis’ persistence configuration. It is likely that the excessive disk writes are due to improper configuration of these options.
Next, let’s verify this guess. We can use Redis’ command-line tool to query the configuration of these two options.
Continuing in terminal 1, run the following command to query the configuration of appendonly
and appendfsync
:
$ docker exec -it redis redis-cli config get 'append*'
1) "appendfsync"
2) "always"
3) "appendonly"
4) "yes"
From this result, you can see that appendfsync
is configured as always
, and appendonly
is configured as yes
. You can find detailed meanings of these two options from the Redis Persistence documentation, but let me give a brief introduction here.
Redis provides two methods of data persistence: snapshots and append-only files.
Snapshots generate periodic snapshots of the data and save them to disk files according to the specified time interval. To avoid blocking the main process, Redis forks a child process to handle the snapshot saving. This method provides good performance and is much better for backup and recovery compared to append-only files.
However, it also has its disadvantages. When dealing with large amounts of data, forking the child process requires a considerable amount of memory and the data saving process is time-consuming. Therefore, you need to set a longer time interval to cope with this, such as at least 5 minutes. This way, if a failure occurs, you would only lose a few minutes of data.
Append-only files persistently store Redis write operations by appending records to the end of the file, making it a safer persistence method.
In addition, it provides an appendfsync
option to set the fsync
policy for ensuring that the written data is flushed to disk. The specific options include always
, everysec
, and no
.
always
means thatfsync
is called on every operation, which is the most secure option.everysec
means thatfsync
is called once every second, ensuring that even in the worst case scenario, only 1 second of data is lost.no
means that it is up to the operating system to handle this.
Recalling the configuration we just saw, appendfsync
is set to always
, meaning that fsync
is called on every write operation, causing significant disk I/O pressure.
Of course, you can also use strace
to observe the execution of this system call. For example, by specifying fdatasync
with the -e
option, you will get the following result:
$ strace -f -p 9085 -T -tt -e fdatasync
strace: Process 9085 attached with 4 threads
[pid 9085] 14:22:52.013547 fdatasync(7) = 0 <0.007112>
[pid 9085] 14:22:52.022467 fdatasync(7) = 0 <0.008572>
[pid 9085] 14:22:52.032223 fdatasync(7) = 0 <0.006769>
...
[pid 9085] 14:22:52.139629 fdatasync(7) = 0 <0.008183>
From this, you can see that there is a fdatasync
call approximately every 10ms, and each call itself takes 7-8ms.
Regardless of the method, our guess is confirmed that the configuration is indeed not reasonable. Thus, we have identified the file that Redis is writing to and the reason for the high I/O.
However, going back to the initial question, why do disk writes occur during queries? Shouldn’t it only involve data reading? For this, we need to re-examine the result of strace -f -T -tt -p 9085
.
read(8, "*2\r\n$3\r\nGET\r\n$41\r\nuuid:53522908-"..., 16384)
write(8, "$4\r\ngood\r\n", 10)
read(8, "*3\r\n$4\r\nSADD\r\n$4\r\ngood\r\n$36\r\n535"..., 16384)
write(7, "*3\r\n$4\r\nSADD\r\n$4\r\ngood\r\n$36\r\n535"..., 67)
write(8, ":1\r\n", 4)
If you were paying attention, you should remember that according to the analysis from lsof
, the regular file with descriptor number 7 is /data/appendonly.aof
, and number 8 is a TCP socket. Looking at the content above, the TCP read/write performed by number 8 follows the standard “request-response” format:
- After reading
GET uuid:53522908-...
from the socket, it responds withgood
. - Then, after reading
SADD good 535...
from the socket, it responds with1
.
For Redis, SADD
is a write operation, so Redis also saves it to the appendonly.aof
file used for persistence.
Observing more strace results, you will find that every time GET returns “good”, there will always be a subsequent SADD operation, which leads to a large amount of disk writes in Redis, even though it is a query interface.
At this point, we have identified the reason why Redis is writing to disk. However, before reaching our final conclusion, we still need to confirm whether the Redis client corresponding to TCP socket 8 is indeed our case application.
We can use the lsof command with the -i option to find out the TCP connection information corresponding to the TCP socket. However, since Redis and the Python application are running inside containers, we need to enter the network namespace of the container to see the complete TCP connection.
Note: The following command uses the nsenter tool, which allows you to enter container namespaces. If this tool is not installed on your system, please run the following command to install nsenter: docker run –rm -v /usr/local/bin:/target jpetazzo/nsenter
In terminal 1, run the following command:
# Since these two containers share the same network namespace, we only need to enter the network namespace of the app container
$ PID=$(docker inspect --format {{.State.Pid}} app)
# -i displays network socket information
$ nsenter --target $PID --net -- lsof -i
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
redis-ser 9085 systemd-network 6u IPv4 15447972 0t0 TCP localhost:6379 (LISTEN)
redis-ser 9085 systemd-network 8u IPv4 15448709 0t0 TCP localhost:6379->localhost:32996 (ESTABLISHED)
python 9181 root 3u IPv4 15448677 0t0 TCP *:http (LISTEN)
python 9181 root 5u IPv4 15449632 0t0 TCP localhost:32996->localhost:6379 (ESTABLISHED)
This time, we can see that file descriptor 8 of redis-server corresponds to the TCP connection localhost:6379->localhost:32996. Among them, localhost:6379 is the listening port of redis-server itself, and therefore localhost:32996 is the Redis client. By observing the last line, localhost:32996 corresponds to our Python application (process ID 9181).
After many twists and turns, we finally found the potential cause of Redis response latency. In summary, we have found two problems.
The first problem is that Redis is configured with appendfsync set to “always”, which causes Redis to trigger the fdatasync system call for every write operation. In today’s case, such high-frequency synchronous writes are not necessary, and the default interval of 1s is sufficient.
The second problem is that the Python application calls the Redis SADD command in the query interface, which is likely an improper use of caching.
For the first configuration problem, we can execute the following command to change appendfsync to “everysec”:
$ docker exec -it redis redis-cli config set appendfsync everysec
OK
After making the change, switch to terminal 2 and you will find that the request time has now been reduced to 0.9s:
{..., "elapsed_seconds":0.9368953704833984,"type":"good"}
As for the second problem, we need to look at the source code of the application. Click on Github to view the source code of the case application:
def get_cache(type_name):
'''handler for /get_cache'''
for key in redis_client.scan_iter("uuid:*"):
value = redis_client.get(key)
if value == type_name:
redis_client.sadd(type_name, key[5:])
data = list(redis_client.smembers(type_name))
redis_client.delete(type_name)
return jsonify({"type": type_name, 'count': len(data), 'data': data})
Indeed, the Python application uses Redis as temporary storage to store the data found during the query process. However, we know that this data can be stored in memory and there is no need to store it in Redis through network calls.
Based on this idea, I have also pushed the modified code to the same source code file, and you can access it through the /get_cache_data
interface at http://192.168.0.10:10000.
Switch to terminal 2, press Ctrl+C to stop the previous curl command, and then run the following curl command to call the new interface /get_cache_data
at http://192.168.0.10:10000:
$ while true; do curl http://192.168.0.10:10000/get_cache_data; done
{..., "elapsed_seconds":0.16034674644470215,"type":"good"}
You will notice that after solving the second problem, the performance of the new interface has been further improved, from the previous 0.9s to less than 0.2s.
Of course, at the end of the case, don’t forget to clean up the case application. You can switch to terminal 1 and run the following command to clean up:
$ docker rm -f app redis
Summary #
Today I took you through an analysis of a Redis cache case.
We started by using top and iostat to analyze the CPU, memory, and disk usage of the system. However, we found that there were no bottlenecks in system resources. At this point, if we want to further analyze, where should we start?
From today’s case, you will find that in order to further analyze, you need to have a certain understanding of the working principles of the system and the application.
For example, in today’s case, although disk I/O did not show any bottlenecks, according to the principles of Redis, there shouldn’t be a large number of disk I/O write operations when querying the cache.
Following this line of thinking, we continued to use tools such as pidstat, strace, lsof, and nsenter to identify two potential problems: improper configuration of Redis and the overuse of Redis by the Python application. Once the bottlenecks were identified, the corresponding optimization work became easier.
Reflection #
Finally, I’d like to leave you with a question to think about. Have you noticed any patterns in the analysis of I/O performance issues from the case studies of MySQL and Redis? If you have any thoughts or insights, feel free to write them down.
Of course, these two case studies don’t cover all possible I/O performance issues. Have you encountered any other I/O performance issues in your actual work? How did you analyze them?
Feel free to discuss with me in the comments section and share this article with your colleagues and friends. Let’s practice and improve together in practical scenarios and through communication.