29 Case Study Solving the Issue of Severe Delay in Redis Response #

Hello, I’m Ni Pengfei.

In the previous section, we analyzed a case of product search based on MySQL. Let’s review it.

When accessing the product search interface, we found that the response was particularly slow. Through analysis of system resources such as CPU, memory, and disk I/O, we found that there was a disk I/O bottleneck at that time, and it was caused by the case application.

Next, using pidstat, we found that the culprit was the mysqld process. We then used strace and lsof to find out the files that mysqld was reading. Based on the file names and paths, we found the database and table that mysqld was operating on. Based on this information, we guessed that it was a slow query problem caused by the lack of an index.

To verify this guess, we went to the MySQL command line terminal and used a database analysis tool to find that the fields accessed by the case application indeed lacked an index. Since the guess was correct, the problem naturally resolved after adding the index.

From this case, you can see that the MyISAM engine of MySQL relies mainly on system cache to accelerate disk I/O access. However, if there are other applications running in the system, the MyISAM engine is difficult to fully utilize the system cache. The cache may be occupied by other applications or even cleared.

Therefore, I generally do not recommend relying solely on system cache for performance optimization of applications. It is preferable to allocate memory within the application and build a fully self-controlled cache, or use third-party caching applications like Memcached, Redis, etc.

Redis is one of the most commonly used key-value storage systems, often used as a database, high-speed cache, message queue proxy, etc. Redis stores data in memory. However, in order to ensure that data is not lost in the event of server failure, in many cases, we need to configure persistence for it, which may cause performance issues with disk I/O.

Today, I will analyze a case of using Redis as a cache with you. This is also an application based on Python Flask, which provides an interface for querying the cache. However, the response time of the interface is relatively long and cannot meet the requirements of the online system.

I would like to thank Dong Guoxing, a senior backend engineer at Ctrip Systems Development Department, for providing today’s case.

Case Preparation #

This case is still based on Ubuntu 18.04, which is also applicable to other Linux systems. The environment I used for this case is as follows:

Machine configuration: 2 CPUs, 8GB memory
Pre-installed tools: docker, sysstat, git, make, etc. You can install them by running apt install docker.io sysstat.

Today’s case consists of two parts: a Python application and Redis. The Python application is based on Flask and uses Redis to manage application caching. It provides three HTTP interfaces:

/: Returns “hello redis”.
/init/: Inserts a specified number of cache data. If the number is not specified, the default is 5000.
Cache keys are in the format of uuid.
Cache values can be good, bad, or normal.
/get_cache/: Queries the cache data of a specified value and returns the processing time. The type_name parameter only supports good, bad, and normal (i.e., it finds the list of keys with the same value).

Due to the number of applications, I have packaged them into two Docker images and pushed them to Github. This way, you only need to run a few commands to start.

For this case, you will need two virtual machines. One of them will be used as the target machine for the case analysis, running the Flask application with the IP address 192.168.0.10. The other one will act as the client, making requests to the cache query interface. I have created a diagram to illustrate their relationship.

Next, open two terminals and SSH into these two virtual machines respectively. Install the tools mentioned earlier on the first virtual machine.

As usual, all commands in the case are assumed to be run as the root user. If you are logged into the system as a regular user, run the command sudo su root to switch to the root user.

With that, the preparation work is complete. Next, we will proceed to the operation.

Case Study #

First, in the first terminal, execute the following command to run the target application for this case study. Normally, you should see the following output:

# Note that the random string below is the container ID, which will be different each time you run it, and you don't need to pay attention to it
$ docker run --name=redis -itd -p 10000:80 feisky/redis-server
ec41cb9e4dd5cb7079e1d9f72b7cee7de67278dbd3bd0956b4c0846bff211803
$ docker run --name=app --network=container:redis -itd feisky/redis-app
2c54eb252d0552448320d9155a2618b799a1e71d7289ec7277a61e72a9de5fd0

Then, run the docker ps command to verify that both containers are running (Up status):

$ docker ps
CONTAINER ID        IMAGE                 COMMAND                  CREATED             STATUS              PORTS                             NAMES
2c54eb252d05        feisky/redis-app      "python /app.py"         48 seconds ago      Up 47 seconds                                         app
ec41cb9e4dd5        feisky/redis-server   "docker-entrypoint.s…"   49 seconds ago      Up 48 seconds       6379/tcp, 0.0.0.0:10000->80/tcp   redis

Today’s application is listening on port 10000, so you can access the three mentioned interfaces through http://192.168.0.10:10000.

For example, switch to the second terminal and use the curl tool to access the application homepage. If you see the output hello redis, it means the application starts normally:

$ curl http://192.168.0.10:10000/
hello redis

Next, continue in the second terminal and execute the curl command below to call the /init interface of the application to initialize the Redis cache and insert 5000 cache entries. This process may be slow. For example, it took me more than ten minutes on my machine. After waiting patiently for a while, you will see the following output:

# Insert 5000 entries in the case. It can be adjusted appropriately according to the disk type during practice, such as increasing it when using SSD and decreasing it when using HDD.
$ curl http://192.168.0.10:10000/init/5000
{"elapsed_seconds":30.26814079284668,"keys_initialized":5000}

Continue with the next command to access the cache query interface of the application. If everything is normal, you will see the following output:

$ curl http://192.168.0.10:10000/get_cache
{"count":1677,"data":["d97662fa-06ac-11e9-92c7-0242ac110002",...],"elapsed_seconds":10.545469760894775,"type":"good"}

We can see that this interface call takes 10 seconds! Such a long response time obviously does not meet the requirements of the actual application.

What could be the problem? We still need to use the performance tools and principles we learned earlier to find this bottleneck.

But don’t worry, in order to avoid the client request ending during the analysis process, we need to put the curl command in a loop before performing performance analysis. You can continue executing the command below in the second terminal:

$ while true; do curl http://192.168.0.10:10000/get_cache; done

Next, go back to the first terminal and look for the cause of the slow response.

In the recent few cases, the phenomenon was slow response, so naturally we would first suspect that there might be a bottleneck in system resources. So, observing the usage of CPU, memory, and disk I/O is certainly not wrong.

First, in the first terminal, execute the top command to analyze the CPU usage of the system:

$ top
top - 12:46:18 up 11 days,  8:49,  1 user,  load average: 1.36, 1.36, 1.04
Tasks: 137 total,   1 running,  79 sleeping,   0 stopped,   0 zombie
%Cpu0  :  6.0 us,  2.7 sy,  0.0 ni,  5.7 id, 84.7 wa,  0.0 hi,  1.0 si,  0.0 st
%Cpu1  :  1.0 us,  3.0 sy,  0.0 ni, 94.7 id,  0.0 wa,  0.0 hi,  1.3 si,  0.0 st
KiB Mem :  8169300 total,  7342244 free,   432912 used,   394144 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  7478748 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 9181 root      20   0  193004  27304   8716 S   8.6  0.3   0:07.15 python
 9085 systemd+  20   0   28352   9760   1860 D   5.0  0.1   0:04.34 redis-server
  368 root      20   0       0      0      0 D   1.0  0.0   0:33.88 jbd2/sda1-8
  149 root       0 -20       0      0      0 I   0.3  0.0   0:10.63 kworker/0:1H
 1549 root      20   0  236716  24576   9864 S   0.3  0.3  91:37.30 python3

Observing the output of top, it can be seen that the iowait of CPU0 is relatively high, reaching 84%; and the CPU usage of each process is not very high, with the highest being python and redis-server at only 8% and 5% respectively. Looking at the memory, the total memory is 8GB, and there is still more than 7GB of free memory, so there doesn’t seem to be any issues with memory.

Taking into account the information from top, the most suspicious culprit is iowait. Therefore, the next step is to continue analyzing whether it is an I/O problem.

In the first terminal, press Ctrl+C to stop the top command; then, execute the following iostat command to check for any I/O performance issues:

$ iostat -d -x 1
Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
...
sda              0.00  492.00      0.00   2672.00     0.00   176.00   0.00  26.35    0.00    1.76   0.00     0.00     5.43   0.00   0.00

Observing the output of iostat, we find that the disk sda has a write rate (wkB/s) of 2.5MB per second, and the I/O utilization (%util) is 0. It seems that although there are some I/O operations, they have not caused an I/O bottleneck on the disk.

After investigating, it turns out that there are no issues with CPU and memory usage, and there is no I/O bottleneck either. It seems like there are no further analysis directions?

In situations like this, it’s always helpful to reflect and consider if any useful clues have been overlooked. You can think for yourself for a moment and focus on three directions: the analyzed application case, system principles, and performance tools. Recall their characteristics, look for abnormal phenomena, and continue to move forward.

Recalling the case problem for today, the issue is slow data querying from the Redis cache. For querying, the corresponding I/O should be disk reads, but what we just saw with iostat was write operations. Although the I/O itself does not have a performance bottleneck, the disk writes here are quite strange. Why would there be disk writes? To find out which process is writing to the disk, we need to rely on our old friend, pidstat. Execute the following pidstat command in terminal one to observe the I/O situation of the processes:

$ pidstat -d 1
12:49:35      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
12:49:36        0       368      0.00     16.00      0.00      86  jbd2/sda1-8
12:49:36      100      9085      0.00    636.00      0.00       1  redis-server

From the output of pidstat, we can see that the process with the most I/O is redis-server with PID 9085, and it is indeed writing to the disk. This indicates that it is indeed redis-server performing disk writes.

Of course, just finding the process that reads and writes to the disk is not enough. We still need to use the combination of strace and lsof to see what redis-server is actually writing.

Next, still in terminal one, execute the strace command and specify the process number 9085 of redis-server:

-f tracks child processes and threads, -T displays the duration of system calls, and -tt displays the trace time #

$ strace -f -T -tt -p 9085 [pid 9085] 14:20:16.826131 epoll_pwait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 65, null, 8) = 1 <0.000055> [pid 9085] 14:20:16.826301 read(8, “*2\r\n$3\r\nGET\r\n$41\r\nuuid:5b2e76cc-”…, 16384) = 61 <0.000071> [pid 9085] 14:20:16.826477 read(3, 0x7fff366a5747, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000063> [pid 9085] 14:20:16.826645 write(8, “$3\r\nbad\r\n”, 9) = 9 <0.000173> [pid 9085] 14:20:16.826907 epoll_pwait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 65, null, 8) = 1 <0.000032> [pid 9085] 14:20:16.827030 read(8, “*2\r\n$3\r\nGET\r\n$41\r\nuuid:55862ada-”…, 16384) = 61 <0.000044> [pid 9085] 14:20:16.827149 read(3, 0x7fff366a5747, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000043> [pid 9085] 14:20:16.827285 write(8, “$3\r\nbad\r\n”, 9) = 9 <0.000141> [pid 9085] 14:20:16.827514 epoll_pwait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 64, null, 8) = 1 <0.000049> [pid 9085] 14:20:16.827641 read(8, “*2\r\n$3\r\nGET\r\n$41\r\nuuid:53522908-”…, 16384) = 61 <0.000043> [pid 9085] 14:20:16.827784 read(3, 0x7fff366a5747, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000034> [pid 9085] 14:20:16.827945 write(8, “$4\r\ngood\r\n”, 10) = 10 <0.000288> [pid 9085] 14:20:16.828339 epoll_pwait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 63, null, 8) = 1 <0.000057> [pid 9085] 14:20:16.828486 read(8, “*3\r\n$4\r\nSADD\r\n$4\r\ngood\r\n$36\r\n535”…, 16384) = 67 <0.000040> [pid 9085] 14:20:16.828623 read(3, 0x7fff366a5747, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000052> [pid 9085] 14:20:16.828760 write(7, “*3\r\n$4\r\nSADD\r\n$4\r\ngood\r\n$36\r\n535”…, 67) = 67 <0.000060> [pid 9085] 14:20:16.828970 fdatasync(7) = 0 <0.005415> [pid 9085] 14:20:16.834493 write(8, “:1\r\n”, 4) = 4 <0.000250>

After observing for a while, did you notice anything interesting?

In fact, from the system calls, it can be seen that epoll_pwait, read, write, and fdatasync system calls are quite frequent. Therefore, the disk writes observed earlier should be caused by write or fdatasync.

Next, let’s run the lsof command to find out the targets of these system calls:

$ lsof -p 9085 redis-ser 9085 systemd-network 3r FIFO 0,12 0t0 15447970 pipe redis-ser 9085 systemd-network 4w FIFO 0,12 0t0 15447970 pipe redis-ser 9085 systemd-network 5u a_inode 0,13 0 10179 [eventpoll] redis-ser 9085 systemd-network 6u sock 0,9 0t0 15447972 protocol: TCP redis-ser 9085 systemd-network 7w REG 8,1 8830146 2838532 /data/appendonly.aof redis-ser 9085 systemd-network 8u sock 0,9 0t0 15448709 protocol: TCP Now you can see that descriptor number 3 is a pipe, number 5 is an eventpoll, number 7 is a regular file, and number 8 is a TCP socket.

Combining this with the phenomenon of disk writes, we know that only the regular file with number 7 will generate disk writes, and the file path it operates on is /data/appendonly.aof. The corresponding system calls include write and fdatasync.

If you are familiar with Redis’ persistence configuration, when you see this file path and the fdatasync system calls, you should be able to realize that they correspond to the appendonly and appendfsync options in Redis’ persistence configuration. It is likely that the excessive disk writes are due to improper configuration of these options.

Next, let’s verify this guess. We can use Redis’ command-line tool to query the configuration of these two options.

Continuing in terminal 1, run the following command to query the configuration of appendonly and appendfsync:

$ docker exec -it redis redis-cli config get 'append*'
1) "appendfsync"
2) "always"
3) "appendonly"
4) "yes"

From this result, you can see that appendfsync is configured as always, and appendonly is configured as yes. You can find detailed meanings of these two options from the Redis Persistence documentation, but let me give a brief introduction here.

Redis provides two methods of data persistence: snapshots and append-only files.

Snapshots generate periodic snapshots of the data and save them to disk files according to the specified time interval. To avoid blocking the main process, Redis forks a child process to handle the snapshot saving. This method provides good performance and is much better for backup and recovery compared to append-only files.

However, it also has its disadvantages. When dealing with large amounts of data, forking the child process requires a considerable amount of memory and the data saving process is time-consuming. Therefore, you need to set a longer time interval to cope with this, such as at least 5 minutes. This way, if a failure occurs, you would only lose a few minutes of data.

Append-only files persistently store Redis write operations by appending records to the end of the file, making it a safer persistence method.

In addition, it provides an appendfsync option to set the fsync policy for ensuring that the written data is flushed to disk. The specific options include always, everysec, and no.

always means that fsync is called on every operation, which is the most secure option.
everysec means that fsync is called once every second, ensuring that even in the worst case scenario, only 1 second of data is lost.
no means that it is up to the operating system to handle this.

Recalling the configuration we just saw, appendfsync is set to always, meaning that fsync is called on every write operation, causing significant disk I/O pressure.

Of course, you can also use strace to observe the execution of this system call. For example, by specifying fdatasync with the -e option, you will get the following result:

$ strace -f -p 9085 -T -tt -e fdatasync
strace: Process 9085 attached with 4 threads
[pid  9085] 14:22:52.013547 fdatasync(7) = 0 <0.007112>
[pid  9085] 14:22:52.022467 fdatasync(7) = 0 <0.008572>
[pid  9085] 14:22:52.032223 fdatasync(7) = 0 <0.006769>
...
[pid  9085] 14:22:52.139629 fdatasync(7) = 0 <0.008183>

From this, you can see that there is a fdatasync call approximately every 10ms, and each call itself takes 7-8ms.

Regardless of the method, our guess is confirmed that the configuration is indeed not reasonable. Thus, we have identified the file that Redis is writing to and the reason for the high I/O.

However, going back to the initial question, why do disk writes occur during queries? Shouldn’t it only involve data reading? For this, we need to re-examine the result of strace -f -T -tt -p 9085.

read(8, "*2\r\n$3\r\nGET\r\n$41\r\nuuid:53522908-"..., 16384)
write(8, "$4\r\ngood\r\n", 10)
read(8, "*3\r\n$4\r\nSADD\r\n$4\r\ngood\r\n$36\r\n535"..., 16384)
write(7, "*3\r\n$4\r\nSADD\r\n$4\r\ngood\r\n$36\r\n535"..., 67)
write(8, ":1\r\n", 4)

If you were paying attention, you should remember that according to the analysis from lsof, the regular file with descriptor number 7 is /data/appendonly.aof, and number 8 is a TCP socket. Looking at the content above, the TCP read/write performed by number 8 follows the standard “request-response” format:

After reading GET uuid:53522908-... from the socket, it responds with good.
Then, after reading SADD good 535... from the socket, it responds with 1.

For Redis, SADD is a write operation, so Redis also saves it to the appendonly.aof file used for persistence. Observing more strace results, you will find that every time GET returns “good”, there will always be a subsequent SADD operation, which leads to a large amount of disk writes in Redis, even though it is a query interface.

At this point, we have identified the reason why Redis is writing to disk. However, before reaching our final conclusion, we still need to confirm whether the Redis client corresponding to TCP socket 8 is indeed our case application.

We can use the lsof command with the -i option to find out the TCP connection information corresponding to the TCP socket. However, since Redis and the Python application are running inside containers, we need to enter the network namespace of the container to see the complete TCP connection.

Note: The following command uses the nsenter tool, which allows you to enter container namespaces. If this tool is not installed on your system, please run the following command to install nsenter: docker run –rm -v /usr/local/bin:/target jpetazzo/nsenter

In terminal 1, run the following command:

# Since these two containers share the same network namespace, we only need to enter the network namespace of the app container
$ PID=$(docker inspect --format {{.State.Pid}} app)
# -i displays network socket information
$ nsenter --target $PID --net -- lsof -i
COMMAND    PID            USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
redis-ser 9085 systemd-network    6u  IPv4 15447972      0t0  TCP localhost:6379 (LISTEN)
redis-ser 9085 systemd-network    8u  IPv4 15448709      0t0  TCP localhost:6379->localhost:32996 (ESTABLISHED)
python    9181            root    3u  IPv4 15448677      0t0  TCP *:http (LISTEN)
python    9181            root    5u  IPv4 15449632      0t0  TCP localhost:32996->localhost:6379 (ESTABLISHED)

This time, we can see that file descriptor 8 of redis-server corresponds to the TCP connection localhost:6379->localhost:32996. Among them, localhost:6379 is the listening port of redis-server itself, and therefore localhost:32996 is the Redis client. By observing the last line, localhost:32996 corresponds to our Python application (process ID 9181).

After many twists and turns, we finally found the potential cause of Redis response latency. In summary, we have found two problems.

The first problem is that Redis is configured with appendfsync set to “always”, which causes Redis to trigger the fdatasync system call for every write operation. In today’s case, such high-frequency synchronous writes are not necessary, and the default interval of 1s is sufficient.

The second problem is that the Python application calls the Redis SADD command in the query interface, which is likely an improper use of caching.

For the first configuration problem, we can execute the following command to change appendfsync to “everysec”:

$ docker exec -it redis redis-cli config set appendfsync everysec
OK

After making the change, switch to terminal 2 and you will find that the request time has now been reduced to 0.9s:

{..., "elapsed_seconds":0.9368953704833984,"type":"good"}

As for the second problem, we need to look at the source code of the application. Click on Github to view the source code of the case application:

def get_cache(type_name):
    '''handler for /get_cache'''
    for key in redis_client.scan_iter("uuid:*"):
        value = redis_client.get(key)
        if value == type_name:
            redis_client.sadd(type_name, key[5:])
    data = list(redis_client.smembers(type_name))
    redis_client.delete(type_name)
    return jsonify({"type": type_name, 'count': len(data), 'data': data})

Indeed, the Python application uses Redis as temporary storage to store the data found during the query process. However, we know that this data can be stored in memory and there is no need to store it in Redis through network calls.

Based on this idea, I have also pushed the modified code to the same source code file, and you can access it through the /get_cache_data interface at http://192.168.0.10:10000.

Switch to terminal 2, press Ctrl+C to stop the previous curl command, and then run the following curl command to call the new interface /get_cache_data at http://192.168.0.10:10000:

$ while true; do curl http://192.168.0.10:10000/get_cache_data; done
{..., "elapsed_seconds":0.16034674644470215,"type":"good"}

You will notice that after solving the second problem, the performance of the new interface has been further improved, from the previous 0.9s to less than 0.2s.

Of course, at the end of the case, don’t forget to clean up the case application. You can switch to terminal 1 and run the following command to clean up:

$ docker rm -f app redis

Summary #

Today I took you through an analysis of a Redis cache case.

We started by using top and iostat to analyze the CPU, memory, and disk usage of the system. However, we found that there were no bottlenecks in system resources. At this point, if we want to further analyze, where should we start?

From today’s case, you will find that in order to further analyze, you need to have a certain understanding of the working principles of the system and the application.

For example, in today’s case, although disk I/O did not show any bottlenecks, according to the principles of Redis, there shouldn’t be a large number of disk I/O write operations when querying the cache.

Following this line of thinking, we continued to use tools such as pidstat, strace, lsof, and nsenter to identify two potential problems: improper configuration of Redis and the overuse of Redis by the Python application. Once the bottlenecks were identified, the corresponding optimization work became easier.

Reflection #

Finally, I’d like to leave you with a question to think about. Have you noticed any patterns in the analysis of I/O performance issues from the case studies of MySQL and Redis? If you have any thoughts or insights, feel free to write them down.

Of course, these two case studies don’t cover all possible I/O performance issues. Have you encountered any other I/O performance issues in your actual work? How did you analyze them?

Feel free to discuss with me in the comments section and share this article with your colleagues and friends. Let’s practice and improve together in practical scenarios and through communication.