04 Aof Log Persistence Redis How to Avoid Data Loss

04 AOF Log Persistence Redis How to Avoid Data Loss #

If someone asks you, “In what business scenario would you use Redis?” I believe you would probably say, “I would use it as a cache because it stores data from the backend database in memory and directly retrieves data from memory, resulting in very fast response speed.” Yes, this is indeed a common use case for Redis, but there is also an issue that absolutely cannot be ignored: once the server crashes, all data in memory will be lost.

One solution that comes to mind is to recover this data from the backend database. However, this approach has two problems: first, it requires frequent database access, which puts a huge load on the database; second, the data is retrieved from a slow database, so the performance is definitely not as good as reading from Redis, which leads to slower response times for applications using this data. Therefore, achieving data persistence and avoiding the need to recover from the backend database is crucial for Redis.

Currently, Redis has two main mechanisms for persistence, namely the AOF (Append Only File) log and RDB snapshots. In the next two lessons, let’s learn about each of them separately. In this lesson, let’s focus on learning about the AOF log first.

How is AOF log implemented? #

When it comes to logs, we are more familiar with the write-ahead log (WAL) of databases. In other words, before actually writing data, the modified data is first recorded in a log file for recovery in case of failure. However, AOF log is the opposite. It is a write-after log, which means that Redis executes commands and writes data to memory before recording the log. The process is illustrated in the following diagram:

Redis AOF operation process

So why does AOF execute commands before logging them? To answer this question, we need to know what content is recorded in the AOF log.

In traditional database logs, such as redo logs, what is recorded is the modified data. In contrast, the AOF log records every command received by Redis, and these commands are saved in text format.

Let’s take the AOF log recorded after Redis receives the “set testkey testvalue” command as an example to see the content of the AOF log. The “*3” indicates that the current command has three parts, each starting with “$+number”, followed by the specific command, key, or value. Here, “number” represents the number of bytes in this part of the command, key, or value. For example, “$3 set” means that this part has 3 bytes, which is the “set” command.

Redis AOF log content

However, in order to avoid additional checking overhead, Redis does not perform syntax checking on these commands when recording them in the AOF log. Therefore, if the log is recorded before executing the command, it is possible that incorrect commands will be recorded in the log, leading to errors when Redis uses the log to recover data.

With the write-after log approach, the system executes the command first, and only if the command is successful, it is recorded in the log. Otherwise, the system will directly report an error to the client. Therefore, one of the benefits of Redis using the write-after log approach is that it can prevent the recording of erroneous commands.

In addition, AOF has another advantage: it records the log after the command is executed, so it does not block the current write operation.

However, AOF also has two potential risks.

First, if a command has just been executed and the system crashes before the log is written, there is a risk of losing that command and the corresponding data. If Redis is used as a cache, it is still possible to recover by re-reading data from the backend database. However, if Redis is directly used as a database, it will not be possible to recover using the log because the command has not been logged.

Second, although AOF avoids blocking the current command, it may introduce blocking risks for the next operation. This is because the AOF log is also executed in the main thread, and if there is high disk write pressure when writing the log file to disk, it will cause slow disk writes and subsequently prevent the execution of subsequent operations.

Upon careful analysis, you will find that both of these risks are related to the timing of AOF log being written back to the disk. This means that if we can control the timing of writing the AOF log back to the disk after a write command is executed, both of these risks can be mitigated.

Three AOF Write Policies #

In fact, for this problem, the AOF mechanism provides us with three choices, which are the three available values of the AOF configuration option appendfsync.

Always: Synchronously write the log to disk immediately after each write command is executed.
Everysec: Write the log to the memory buffer of the AOF file after each write command is executed, and write the content of the buffer to disk every second.
No: Write the log to the memory buffer of the AOF file after each write command is executed, and let the operating system decide when to write the buffer content to disk.

For the purpose of avoiding main thread blocking and reducing data loss, none of these three write policies can achieve both. Let’s analyze the reasons.

Always can ensure minimal data loss, but it has a slow disk write operation after each write command, which inevitably affects the performance of the main thread.
Although the operating system controls the write-back in the No policy, allowing subsequent commands to be executed after writing to the buffer, the timing of the disk write is no longer in the hands of Redis. As long as the AOF records are not written back to the disk, the corresponding data will be lost in the event of a crash.
Everysec writes back at a frequency of once per second, avoiding the performance overhead of Always, which reduces the impact on system performance. However, if a crash occurs, the command operations not written to the disk within the previous second will still be lost. Therefore, this can only be considered a compromise between avoiding the impact on main thread performance and avoiding data loss.

I have summarized the timing and pros and cons of these three policies in a table for your reference.

At this point, we can choose which write policy to use based on the system’s requirements for high performance and high reliability. In summary: if you want high performance, choose the No policy; if you want high reliability, choose the Always policy; if a slight data loss is acceptable and you don’t want performance to be greatly affected, then choose the Everysec policy.

However, selecting a write policy based on the system’s performance requirements is not the end of it. After all, the AOF logs all received write commands in the form of a file. As more and more write commands are received, the AOF file will become larger. This means that we must be careful about the performance issues caused by a large AOF file.

The “performance issues” here mainly involve three aspects: First, the file system itself has a limit on file size and cannot save excessively large files. Second, if the file is too large, appending command records to it in the future will become less efficient. Third, if a crash occurs, the commands recorded in the AOF need to be executed one by one for fault recovery. If the log file is too large, the entire recovery process will be very slow, which will affect the normal use of Redis.

Therefore, we need to take certain control measures, and this is where the AOF rewrite mechanism comes into play.

What to do with a large log file? #

In simple terms, the AOF rewriting mechanism in Redis creates a new AOF (Append-Only File) file based on the current state of the database. It reads all key-value pairs in the database and records a command for each pair. For example, after reading the key-value pair “testkey”: “testvalue”, the rewriting mechanism will record the command “set testkey testvalue”. This way, when recovery is needed, the command can be executed again to restore the key-value pair “testkey”: “testvalue”.

Why does the rewriting mechanism make the log file smaller? Essentially, the rewriting mechanism has a “many-to-one” function. “Many-to-one” means that multiple commands in the old log file are combined into one command in the new log file.

As we know, the AOF file appends the received write commands one by one. When a key-value pair is repeatedly modified by multiple write commands, the corresponding commands will be recorded in the AOF file. However, during the rewriting process, the current state of each key-value pair is used to generate a single write command. As a result, each key-value pair only needs one command in the rewritten log file. When recovering the log, only this command needs to be executed to directly achieve the write operation of the key-value pair.

The following image is an example:

AOF rewriting reduces the log file size

In the example, when we perform 6 modification operations on a list, the final state of the list is [“D”, “C”, “N”]. In this case, executing the command “LPUSH u:list “N”, “C”, “D”” is enough to restore the data, saving five commands. For key-value pairs that have been modified hundreds or thousands of times, rewriting can save even more space.

However, although the AOF rewriting reduces the size of the log file, it is still a time-consuming process to write the operation logs of the entire database back to the disk. At this point, we need to consider another issue: will the rewriting block the main thread?

Does AOF rewriting block? #

Unlike AOF log writing by the main thread, the rewriting process is done by a background sub-process called bgrewriteaof. This is to avoid blocking the main thread and causing a decrease in database performance.

I summarize the rewriting process as “one copy, two logs.”

“One copy” means that every time a rewrite is executed, the main thread forks a background bgrewriteaof sub-process. At this point, the fork will make a copy of the main thread’s memory for the bgrewriteaof sub-process, which includes the latest data in the database. Then, the bgrewriteaof sub-process can write the copied data into operations one by one and record them in the rewrite log without affecting the main thread.

What about the “two logs”?

Because the main thread is not blocked and can still process new operations, if there are write operations, the first log refers to the AOF log in use. Redis will write this operation to its buffer. In this way, even if there is a crash, the operations in this AOF log will still be complete and can be used for recovery.

The second log refers to the new AOF rewrite log. This operation will also be written to the buffer of the rewrite log. In this way, the latest operations recorded in the rewrite log will not be lost. When all the operation records of the copied data have been rewritten, these latest operations recorded in the rewrite log will also be written into the new AOF file to ensure the recording of the latest state of the database. At this point, we can replace the old file with the new AOF file.

Non-blocking process of AOF rewriting

In summary, each time AOF is rewritten, Redis first performs a memory copy for rewriting. Then, using two logs, it ensures that the newly written data is not lost during the rewriting process. Moreover, because Redis uses an additional thread for data rewriting, this process does not block the main thread.

Summary #

In this lesson, I introduced Redis’ AOF (Append-Only File) method for preventing data loss. This method ensures data reliability by recording operations commands one by one and executing them one by one during recovery.

This method seems “simple,” but it also takes into account the impact on Redis performance. In summary, it provides three write-back strategies for AOF logs: Always, Everysec, and No. These three strategies are ranked from high to low in terms of reliability and from low to high in terms of performance.

In addition, to avoid excessively large log files, Redis also provides an AOF rewrite mechanism, which directly generates insertion commands for the latest state of the data in the database as a new log. This process is completed by a background thread, avoiding blocking the main thread.

Among them, the three write-back strategies reflect an important principle in system design, which is trade-off, or “making choices,” referring to making trade-offs between performance and reliability guarantees. I believe this is a key philosophy in system design and development, and I hope you can fully understand this principle and apply it in your daily development.

However, you may have also noticed that both the disk flush timing and the rewrite mechanism play a role in the “logging” process. For example, choosing the disk flush timing can avoid blocking the main thread during logging, and rewriting can avoid excessively large log files. However, in the process of “using the log,” that is, using AOF for failure recovery, we still need to run all the recorded operations again. In addition to Redis’ single-threaded design, these command operations can only be executed one by one in order, resulting in a slow “replay” process.

So, are there any methods that can both avoid data loss and recover faster? Of course, that would be RDB snapshots. In the next lesson, we will learn about them together. Stay tuned!

One Question per Lesson #

In this lesson, I have two small questions for you:

When rewriting the AOF log, it is the bgrewriteaof subprocess that performs the task, without involving the main thread. The non-blocking feature we discussed today refers to the execution of the subprocess without blocking the main thread. However, do you think there are any other potential blocking risks in this rewriting process? If so, where would they occur?
AOF rewriting also has its own rewrite log. Why doesn’t it share and use the AOF log itself?

I hope you can think about these two questions and feel free to share your answers in the comments section. Additionally, feel free to forward the content of this lesson and engage in discussions with more people.