05 Memory Snapshot Persistent Failure Redis How to Quickly Recover

05 Memory Snapshot Persistent Failure Redis How to Quickly Recover #

In the previous lesson, we learned about the AOF method used by Redis to avoid data loss. The advantage of this method is that it only needs to record operation commands, and the amount of data that needs to be persisted is relatively small. Generally speaking, as long as you don’t use the “always” persistence strategy, it won’t have a significant impact on performance.

However, because it records operation commands instead of actual data, when using the AOF method for fault recovery, it is necessary to execute each operation log one by one. If there are a large number of operation logs, Redis will recover slowly, which affects normal usage. This is certainly not an ideal result. So, are there any other methods that can ensure reliability and achieve quick recovery after a crash?

Certainly, there is. This is the other persistence method we are going to learn today: Memory Snapshot. The so-called memory snapshot refers to the state record of data in memory at a certain moment. It is similar to a photograph, where when you take a photo of a friend, a single photo can capture the friend’s image in an instant.

For Redis, the way it achieves a similar photographic recording effect is to write the state of data at a certain moment to disk in the form of a file, which is a snapshot. In this way, even if there is a crash, the snapshot file will not be lost, ensuring the reliability of the data. This snapshot file is called the RDB file, where RDB stands for Redis Database.

Compared with AOF, RDB records the data at a certain moment, not the operations. Therefore, when doing data recovery, we can directly load the RDB file into memory to quickly complete the recovery process. It sounds good, but memory snapshot is not necessarily the best option. Why is that?

We still need to consider two key issues:

Which data to snapshot? This relates to the efficiency of snapshot execution.
Can the data be modified when a snapshot is being taken? This relates to whether Redis is blocked and whether it can handle requests normally at the same time.

You might not fully understand this yet, so let me give an example of taking photos. When taking photos, we usually need to consider two things:

How should the scene be framed? In other words, which people or objects do we want to capture in the photo?
Before pressing the shutter, we need to remind our friends not to move around, otherwise, the photo will be blurry.

You see, aren’t these two issues very important? So, let’s talk about them specifically next. Let’s start with the “framing” issue, which refers to which data we want to snapshot.

Which memory data is being snapshot? #

Redis stores all its data in memory. In order to provide reliability for all data, Redis performs a full snapshot, which means that all data in memory is recorded to disk. This is similar to taking a group photo of 100 people, ensuring that every person is captured in the picture. The advantage of this approach is that all data is recorded at once, without missing anything.

When taking a photo of an individual, you only need to coordinate with that person. However, taking a group photo of 100 people requires coordination of the positions and states of all 100 people, which can be time-consuming and labor-intensive. Similarly, taking a snapshot of all memory data and writing it to disk also takes a considerable amount of time. Furthermore, the larger the amount of data, the larger the RDB file size, and the greater the time overhead of writing data to disk.

For Redis, its single-threaded model means that we need to avoid any operations that would block the main thread as much as possible. Hence, for any operation, we always ask ourselves the critical question: “Will it block the main thread?” Whether generating an RDB file will block the main thread determines whether it will impact the performance of Redis.

Redis provides two commands for generating RDB files: SAVE and BGSAVE.

SAVE: executed in the main thread, which blocks the thread.
BGSAVE: creates a child process dedicated to writing the RDB file, avoiding the blocking of the main thread. This is the default configuration for generating Redis RDB files.

Now, we can use the BGSAVE command to perform a full snapshot, which ensures data reliability without impacting the performance of Redis.

Next, we need to address the question of whether the data can still be modified during the snapshot process. In other words, can these data be updated? This issue is very important because if the data can still be modified, it means that Redis can handle write operations as usual. Otherwise, all write operations will have to wait until the snapshot is complete, resulting in a sudden decrease in performance.

Can data be modified during a snapshot? #

When taking a photo of someone, if the person moves, the photo will be blurry and we will need to retake it. Therefore, it is preferable for the person to remain still. Similarly, when it comes to memory snapshots, we also prefer the data to remain “still”.

For example, let’s say we take a snapshot of the memory at time t. Assuming the memory size is 4GB and the disk write bandwidth is 0.2GB/s, it would take at least 20 seconds to complete the snapshot (4/0.2 = 20). If at time t+5s, some memory data A that has not been written to the disk yet is modified to become A’, it will compromise the integrity of the snapshot because A’ is not the state of the memory at time t. Therefore, similar to taking a photo, we do not want the data to be “in motion” during a snapshot, meaning it cannot be modified.

However, there can be potential issues if data cannot be modified during the execution of a snapshot. In the previous example, during the 20-second duration of taking the snapshot, if these 4GB of data cannot be modified, Redis would not be able to process write operations on these data, which would undoubtedly have a significant impact on business services.

You might think that using bgsave can avoid blocking. Here I have to address a common misconception: avoiding blocking and processing write operations normally are not the same thing. In this case, the main thread is indeed not blocked and can receive requests normally, but to ensure the integrity of the snapshot, it can only handle read operations because it cannot modify the data being snapped.

It is unacceptable to pause write operations for the sake of taking a snapshot. Therefore, Redis leverages the copy-on-write (COW) technique provided by the operating system to perform write operations normally while executing the snapshot.

In simple terms, the bgsave child process is forked by the main thread and can share all the memory data of the main thread. After the bgsave child process starts running, it reads the memory data of the main thread and writes them to the RDB file.

At this time, if the main thread also performs only read operations on these data (such as the key-value pair A in the figure), then the main thread and the bgsave child process do not affect each other. However, if the main thread intends to modify a piece of data (such as the key-value pair C in the figure), this data will be copied, creating a replica of the data (key-value pair C’). Then, the main thread can modify this replica data. At the same time, the bgsave child process can continue to write the original data (key-value pair C) to the RDB file.

Write-copy mechanism ensures data can be modified during snapshots

The copy-on-write mechanism ensures data can be modified during snapshots.

This ensures the integrity of the snapshot while allowing the main thread to simultaneously modify the data, avoiding any impact on normal operations.

So far, we have addressed the two major questions regarding “which data to snapshot” and “whether data can be modified during the snapshot”: Redis will use bgsave to take a snapshot of all the current data in memory, and this operation is performed by a child process in the background, which allows the main thread to modify the data simultaneously.

Now, let’s consider another question: how often should we take snapshots? When taking photos, there is a technique called “continuous shooting” which captures the continuous moments of a person or object. So, is the snapshot suitable for “continuous shooting” as well?

Can snapshots be taken every second? #

For snapshots, “burst shooting” refers to continuously taking snapshots. In this way, the interval between snapshots becomes very short. Even if a crash occurs at a certain moment, because the snapshot was just executed at the previous moment, the amount of lost data is not too much. However, the interval between snapshots is crucial.

As shown in the figure below, we took a snapshot at T0 and another snapshot at T0+t. During this period, data blocks 5 and 9 were modified. If a crash occurs during this period, we can only restore based on the snapshot at T0. At this time, the modified values of data blocks 5 and 9 cannot be recovered because there are no snapshot records for them.

Snapshot data loss

Data loss in snapshot mechanism

To recover data as much as possible, the value of t needs to be as small as possible. The smaller the t value, the more it is like “burst shooting”. So, how small can the t value be? Is it possible to take a snapshot every second, for example? After all, each snapshot is executed by the bgsave subprocess in the background and does not block the main thread.

This idea is actually wrong. Although bgsave does not block the main thread during execution, frequent execution of full snapshots will also bring two types of overhead.

On the one hand, frequently writing the full data to the disk puts a lot of pressure on the disk. Multiple snapshots competing for limited disk bandwidth can cause a vicious cycle if the previous snapshot has not finished and the next one starts, causing frequent disruptions.

On the other hand, the bgsave subprocess needs to be created from the main thread through a fork operation. Although the subprocess does not block the main thread after creation, the fork operation itself will block the main thread, and the larger the memory of the main thread, the longer the blocking time. If bgsave subprocesses are forked frequently, the main thread will be frequently blocked (so, in Redis, if a bgsave is running, another bgsave subprocess will not be started). So, are there any other good methods?

At this time, we can use incremental snapshots. An incremental snapshot means that after a full snapshot, subsequent snapshots only record the modified data, which avoids the overhead of a full snapshot every time.

After the first full snapshot is taken, if we take another snapshot at T1 and T2, we only need to write the modified data to the snapshot file. However, the prerequisite for doing this is that we need to remember which data has been modified. Do not underestimate this “remember” function. It requires us to use additional metadata information to record which data has been modified, which will bring additional space overhead. As shown in the figure below:

Incremental snapshot illustration

If we record every modification of each key-value pair, then if there are 10,000 modified key-value pairs, we need 10,000 additional records. Moreover, sometimes, the key-value pairs are very small, for example, only 32 bytes, but recording the metadata information that it was modified may require 8 bytes. In this case, introducing additional space overhead to “remember” the modifications is not worth it for Redis, where memory resources are precious.

So far, you can see that although compared to AOF, snapshots have the advantage of fast recovery, the frequency of snapshots is not easy to grasp. If the frequency is too low and a crash occurs between two snapshots, there may be a significant amount of data loss. If the frequency is too high, it will generate additional overhead. So, is there a method that can make use of the fast recovery of RDB files and minimize the overhead while minimizing data loss?

Redis 4.0 proposed a method of mixing AOF logs and in-memory snapshots. In simple terms, the in-memory snapshot is executed at a certain frequency, and between two snapshots, all command operations during this period are recorded using the AOF log.

In this way, snapshots do not need to be executed frequently, which avoids the impact of frequent forks on the main thread. Moreover, the AOF log only needs to record the operations between two snapshots, which means that not all operations need to be recorded. Therefore, there will be no situation of a large file size and it can also avoid the overhead of rewriting.

As shown in the figure below, the modifications at T1 and T2 are recorded using the AOF log. When the second full snapshot is taken, the AOF log can be cleared because the modifications are already recorded in the snapshot and are no longer needed for recovery.

Mixing in-memory snapshot and AOF

This method allows us to enjoy the benefits of fast recovery from RDB files and the simplicity of AOF only recording command operations. It feels like “having the best of both worlds” and is recommended for use in practice.

Summary #

In this lesson, we learned about Redis’ memory snapshot method to avoid data loss. The advantage of this method is that it can quickly recover the database by simply reading the RDB file directly into memory, avoiding the inefficient performance issues caused by the need to sequentially and individually execute operation commands in AOF.

However, memory snapshots also have their limitations. They take a “big picture” of the memory, which inevitably takes time and effort. Although Redis has designed bgsave and copy-on-write methods to minimize the impact of memory snapshots on normal read and write operations, frequent snapshots are still not acceptable. Using a combination of RDB and AOF can take advantage of the strengths and avoid the weaknesses of both, ensuring data reliability and performance with minimal performance overhead.

Finally, regarding the choice between AOF and RDB, I would like to give you three recommendations:

When data loss is not allowed, using a combination of memory snapshots and AOF is a good choice.
If data loss at the minute level is acceptable, you can use only RDB.
If AOF is used alone, prioritize using the everysec configuration option as it strikes a balance between reliability and performance.

One question per lesson #

I have encountered a scenario where we run Redis on a cloud server with a 2-core CPU, 4GB of memory, and a 500GB disk. The size of the Redis database is about 2GB, and we use RDB for persistence. At that time, the workload of Redis was mainly focused on write operations, with a write-to-read ratio of approximately 8:2, meaning that out of 100 requests, 80 of them were write operations. Do you think there are any risks in using RDB for persistence in this scenario? Can you help analyze it together?

With this, we have completed the discussion on persistence. This section is a fundamental aspect of mastering Redis, and I recommend that you study these two lessons thoroughly. If you find it helpful, I hope you can share it with more people to help them solve persistence-related issues.