10 Lecture 19 Post Lecture Thinking Problems, Answers, and Common Questions

10 Lecture 19 Post-lecture Thinking Problems, Answers, and Common Questions #

We have already completed 9 lectures in our course, and during this time, I have received a lot of messages. Many students have diligently answered the exercise questions, and some of the answers can even be considered standard answers. In addition, many students have asked very good questions about the basic principles and key mechanisms of Redis, which are worth discussing in depth.

Today, I will talk to you about the answers to the exercise questions and explain some typical questions in order to address your confusion.

Answers to After-class Exercise Questions #

Lecture 1 #

Question: What is SimpleKV still lacking compared to Redis?

The answers given by @曾轼麟 and @Kaito are both excellent. They provided detailed comparisons between SimpleKV and Redis, covering aspects such as data structures, functional extensions, memory efficiency, transactionality, and high availability and scalability. They also analyzed from the perspective of operation and maintenance. Let me share their answers first.

@曾轼麟:

  1. Data structures: SimpleKV lacks support for widely used data structures such as SkipList for range queries and Stream.
  2. High availability: It lacks a high availability design with features like sentinel or master-slave mode.
  3. Memory safety: It lacks support for key eviction algorithms in cases of memory overload.
  4. Memory utilization: It does not optimize data structures to improve memory utilization, such as using compressed data structures.

@Kaito:

SimpleKV lacks: rich data types, data compression support, expiration mechanisms, data eviction strategies, master-slave replication, clusterization, high availability clusters, and can also add auxiliary functions such as statistics, notification modules, debugging modules, metadata queries, etc.

Here’s my summary of the answers. Do you remember the “two dimensions” and “three main lines” I mentioned in the “Preface”? We can also use this framework for analysis, as shown in the table below. In addition, at the end of the table, I also compared SimpleKV and Redis in terms of key-value database development and operational tools.

Lecture 2 #

Question: What are the advantages of using arrays of integers and compressed lists as the underlying data structures?

The design of arrays of integers and compressed lists fully demonstrates Redis’s “fast and memory-saving” characteristics in terms of memory saving. Arrays of integers and compressed lists both allocate a contiguous block of memory in which the elements of the collection are stored one after another, which is very compact. Because the elements are placed in sequence, we don’t need additional pointers to connect the elements, which avoids the space overhead caused by additional pointers.

Let me draw a diagram to show the memory layout of these two structures. The entries in arrays of integers and compressed lists are actual collection elements, which are stored one after another, resulting in efficient memory utilization.

The reason Redis uses different data structures is to strike a balance between performance and memory utilization.

Lecture 3 #

Question: What other potential performance bottlenecks are there in the basic I/O model of Redis?

This question is intended to further understand the impact of blocking operations on the single-threaded performance of Redis. In the basic I/O model of Redis, the main thread is mainly responsible for executing operations, and any time-consuming operations, such as bigkeys and full bulk replies, can become potential performance bottlenecks.

Lecture 4 #

Question 1: Are there any other potential blocking risks during AOF rewriting?

There are two risks here.

Risk 1: When the Redis main thread forks and creates the bgrewriteaof child process, the kernel needs to create process control blocks (PCBs) for managing the child process. The kernel copies the content of the PCB of the main thread to the child process. This creation and copying process is performed by the kernel and blocks the main thread. Moreover, during the copying process, the child process needs to copy the page table of the parent process. The duration of this process depends on the size of the Redis instance’s memory. If the Redis instance has a large memory, the page table will be large, and the fork execution time will be long, which poses a blocking risk to the main thread.

Risk 2: The bgrewriteaof child process shares memory with the main thread. When the main thread receives new write or modification operations, it allocates new memory space to store the new data. If the operation is a bigkey, i.e., a collection with a large amount of data, the main thread faces a blocking risk when allocating a large space. This is because the operating system incurs lookup and lock overhead when allocating memory space, leading to blocking.

Question 2: Why doesn’t AOF rewriting share the AOF log itself?

If both the main thread and the bgrewriteaof child process use the AOF log, they will compete for file system locks, which will affect the performance of the Redis main thread.

Lecture 5 #

Question: Suppose a cloud host is running Redis with a 2-core CPU, 4GB of memory, and a 500GB disk. The size of the Redis database is approximately 2GB. At that time, Redis mainly focuses on modifying operations, with a write-read ratio of about 8:2. In other words, out of 100 requests, 80 requests are for modification operations. In this scenario, what are the risks of using RDB for persistence?

@Kaito’s answer analyzed the risks from the perspectives of memory resources and CPU resources, which is excellent. I have made some improvements and simplifications. You can refer to the following:

Risk of insufficient memory: When Redis forks a child process to perform RDB writing, if the main thread receives a write operation, copy-on-write will be used. Copy-on-write requires allocating new memory space for the data of the write operation. In this scenario, the write ratio is 80%. Therefore, during the persistence process, in order to save the data involved in 80% of the write operations, the copy-on-write mechanism will allocate new memory space for these data in the instance’s memory. The allocated memory is approximately 80% of the size of the entire instance’s data, which is about 1.6GB. As a result, the usage of the entire system memory is close to saturation. At this point, if there are still a large number of new key writes or key modifications in the instance, the memory of the cloud host will be quickly exhausted. If the cloud host has Swap enabled, part of the data will be swapped to the disk. When accessing this part of the data on the disk, the performance will drop sharply. If Swap is not enabled on the cloud host, it will directly trigger an OOM (out-of-memory) issue, and the entire Redis instance will face the risk of being killed by the system.

Risk of competition for CPU resources between the main thread and child processes: The child process that generates RDB requires CPU cores to run. The main thread itself also requires CPU cores to run. Additionally, if Redis has background threads enabled, the main thread, child process, and background threads will all compete for CPU resources. Since the cloud host only has 2 CPU cores, it will affect the speed at which the main thread handles requests.

Lecture 6 #

Question: Why doesn’t the replication between master and slave use AOF?

Answer: There are two reasons.

  1. RDB files are binary files, and both writing RDB to disk and transferring RDB over the network have higher IO efficiency than recording and transferring AOF.
  2. When performing recovery on the slave side, the recovery efficiency with RDB is higher than with AOF.

Lecture 7 #

Question 1: During the process of master-slave switch, can the client continue to perform request operations normally?

In a master-slave cluster, read and write operations are generally carried out in a read-write separation mode. When the master node fails, the client can still send read requests to the slave node for service. However, for write requests, the client cannot execute them. Question 2: What else does the sentinel or client need to do to make the application unaware of service interruption?

On one hand, the client needs to be able to cache write requests sent by the application. As long as it is not a synchronous write operation (Redis applications generally do not have synchronous writes), write requests usually do not occur on the critical path of the application. Therefore, after the client caches the write requests, it can simply return a confirmation to the application.

On the other hand, after the master-slave switch is completed, the client needs to be able to reconnect with the new master database, and the sentinel needs to provide a subscription channel for the client to subscribe to the information of the new master database. At the same time, the client also needs to be able to actively communicate with the sentinel, inquiring about the information of the new master database.

Lecture 8 #

Question 1: If there is a cluster of 5 sentinel instances with a quorum value set to 2, during operation, if 3 sentinel instances fail, can the Redis master still correctly determine the “objective offline” status of the master database? If so, can the automatic switch between master and slave still be performed?

The basis for determining the “objective offline” status of the master database is that the number of sentinels that consider the master database to be “subjectively offline” should be greater than or equal to the quorum value. Now there are still 2 sentinel instances left, which is exactly equal to the quorum value, so the objective offline status of the master database can still be correctly determined. In order for a sentinel to perform a master-slave switch, it needs to obtain the approval of more than half of the sentinel votes, which means at least 3 sentinel votes are required. But now there are only 2 sentinels left, so the master-slave switch cannot be performed.

Question 2: Is it better to have more sentinel instances? Will increasing the down-after-milliseconds value also help reduce misjudgment?

Having more sentinel instances will result in a lower misjudgment rate. However, when determining the offline status of the master database and electing a leader, the instances need to obtain more approval votes, which may increase the time waiting for all sentinel instances to finish voting, and the time for master-slave switch may also be prolonged. The client may accumulate a large number of request operations, which may cause request overflow and result in request loss. If the business layer has response time requirements for Redis operations, there may be timeout alarms because the new master database is not yet selected.

Increasing the down-after-milliseconds value may cause the following situation: Although the master database has actually failed, the sentinel takes a long time to detect it, which will affect the availability of Redis for the business.

Lecture 9 #

Question: Why doesn’t Redis directly use a table to record the correspondence between key-value pairs and instances?

If a table is used to record the correspondence between key-value pairs and instances, once the correspondence between key-value pairs and instances changes (such as instance addition or deletion or data redistribution), the table needs to be modified. If the table is operated in a single thread, all operations need to be executed serially, resulting in slow performance. If the table is operated in multiple threads, it involves the overhead of locking. In addition, if the data volume is very large, using a table to record the correspondence between key-value pairs and instances will increase the additional storage space required.

With the calculation based on hash slots, although the correspondence between hash slots and instances also needs to be recorded, the number of hash slots is much smaller than the number of key-value pairs. Whether modifying the correspondence between hash slots and instances or using additional space to store the correspondence between hash slots and instances, the overhead is much smaller than directly recording the correspondence between key-value pairs and instances.

Alright, have you answered all these questions? If you have any other thoughts, feel free to leave a comment and discuss with me and other students.

Explanation of typical problems #

Next, let me explain some representative problems, including the timing and execution mechanism of Redis rehash, the relationship and difference between the main thread, child processes, and background threads, the underlying implementation principles of copy-on-write, and the difference between replication buffer and repl_backlog_buffer.

Question 1: Trigger timing and progressive execution mechanism of rehash #

I have found that many students are interested in the hash table data structure of Redis, especially the rehash operation of the hash table, so I will focus on answering two questions.

1. When does Redis perform rehash?

Redis uses the load factor to determine if rehash is needed. The load factor is calculated by dividing the number of entries in the hash table by the number of hash buckets in the hash table. Redis triggers the rehash operation based on two conditions of the load factor:

  • Load factor ≥ 1 and the hash table is allowed to rehash;
  • Load factor ≥ 5.

In the first case, if the load factor is equal to 1 and we assume that all key-value pairs are evenly distributed among the buckets of the hash table, then the hash table can use a non-chaining hash because each bucket precisely stores one key-value pair.

However, if new data is written at this time, the hash table will have to use a chaining hash, which will affect query performance. During RDB generation and AOF rewriting, hash table rehash is prohibited in order to avoid affecting RDB and AOF rewriting. If Redis is not generating RDB or rewriting AOF at this time, rehash can be performed. Otherwise, when new data is written, the hash table will start using the slower chaining hash.

In the second case, when the load factor is greater than or equal to 5, it means that the current amount of data stored is much larger than the number of hash buckets, and there will be a large number of chained hashes in the hash buckets, which will severely impact performance. At this point, rehash is immediately performed.

The above describes the triggering conditions for rehash. If the load factor is less than 1, or the load factor is greater than 1 but less than 5, and the hash table is temporarily not allowed to perform rehash (for example, the instance is generating RDB or rewriting AOF), the hash table will not perform rehash.

2. When using progressive hash, does Redis not perform rehash if the instance temporarily does not receive new requests?

Actually, it’s not the case. Redis performs scheduled tasks, and rehash is included in these scheduled tasks. The so-called scheduled task is a task that is executed at a certain frequency (such as every 100ms/times).

After rehash is triggered, even if no new requests are received, Redis will still perform rehash operation at regular intervals, and each execution will not exceed 1ms to avoid affecting other tasks.

Question 2: Relationship and difference between main threads, child processes, and background threads #

In the course, I mentioned terms such as main thread, main process, child process, child thread, and background thread. Some students may be confused, so let me summarize their differences for you.

First, let me explain the difference between processes and threads.

From the perspective of the operating system, a process is generally a unit of resource allocation, for example, a process has its own heap, stack, virtual memory space (page table), file descriptors, etc.; while a thread is generally an entity that CPU schedules and executes.

After understanding the difference between processes and threads, let’s look at what main process and main thread mean.

If a process starts and does not create additional threads, then such a process is generally called the main process or main thread.

For example, below is a C program code fragment I wrote, the main function directly calls a worker function, and the worker function executes a for loop calculation. After running the program below, it becomes a main process on its own, and at the same time, it is also a main thread.

int counter = 0;
void *worker() {  
   for (int i=0;i<10;i++) {
      counter++;
   }
    return NULL;
}

int main(int argc, char *argv[]) {
   worker();
}

Similar to this code snippet, Redis itself is a process after startup, which will receive requests from clients and process read and write operations. Receiving and processing requests is the main job of Redis, and Redis does not rely on other threads. Therefore, I generally refer to the Redis process that completes this main job as the main process or main thread.

In the main thread, we can also use fork to create child processes or use pthread_create to create threads in Redis. Let me first introduce the child processes created using fork in Redis.

  • Create a background child process for RDB and transfer RDB to the slave during replication;
  • Transfer RDB using diskless replication;
  • bgrewriteaof child process.

Next, let’s take a look at the threads used by Redis. Starting from version 4.0, Redis also uses pthread_create to create threads. After these threads are created, they usually perform tasks on their own, such as executing asynchronous deletion tasks. Relatively speaking, compared to the main thread that completes the main work, we generally call these threads background threads. I will specifically introduce the execution mechanism of Redis background threads in Lesson 16.

To help you better understand, I have drawn a diagram below to show the differences.

Image

Question 3: Underlying implementation mechanism of copy-on-write #

When Redis uses the RDB method for persistence, it will use copy-on-write mechanism. In Lesson 5, I emphasized the effect of copy-on-write: the bgsave child process copies the original data and the main thread can still modify the original data.

Today, I will further explain the underlying implementation mechanism of copy-on-write.

For Redis, after the main thread forks the bgsave child process, the bgsave child process actually copies the page table of the main thread. These page tables store the physical addresses of all data blocks in memory from the main thread when executing the bgsave command. In this way, when the bgsave child process generates the RDB file, it can read this data based on the page table and write it to disk. If during this time, the main thread receives a new write or modify operation, the main thread will use the copy-on-write mechanism. Specifically, copy-on-write means that the main thread only writes the new written or modified data to a new physical address when there is a write operation, and modifies its own page table mapping.

Let me illustrate the underlying mechanism of copy-on-write using an example in the following diagram.

After the bgsave child process copies the page table of the main thread, if the main thread needs to modify the data in virtual page 7, the main thread needs to allocate a new physical page (let’s assume it is physical page 53), and write the modified data in virtual page 7 to physical page 53, while the original data in virtual page 7 is still stored in physical page 33. At this time, the mapping relationship from virtual page 7 to physical page 33 is still retained in the bgsave child process. Therefore, the bgsave child process can correctly write the original data of virtual page 7 to the RDB file.

Image

Question 4: Difference between replication buffer and repl_backlog_buffer #

When performing master-slave replication, Redis uses a replication buffer and a repl_backlog_buffer. Some students may not be clear about the difference between them, so let me explain.

In general, the replication buffer is a buffer used by the master and slave libraries to connect to the client during full replication on the master library, while the repl_backlog_buffer is a dedicated buffer used on the master library to continuously save write operations to support incremental replication on the slave library.

When performing replication between Redis master and slave, when the master wants to send the write operation commands that occurred during the full replication period to the slave, the master will first create a client to connect to the slave, and then send the write operation commands to the slave through this client. In memory, the client on the master corresponds to a buffer, which is called the replication buffer. The size of this buffer can be controlled by the client_buffer configuration item. Each slave will have a corresponding client on the master, so the replication buffer is not shared, but each slave has its own corresponding client.

repl_backlog_buffer is a dedicated buffer that continuously receives write operation commands after Redis server starts up. This buffer is shared by all slaves. Both the master and slave keep records of their own replication progress, so different slaves can independently synchronize by sending their own replication progress (slave_repl_offset) to the master.

Image

Alright, that’s all for this lesson. Thank you very much for your careful thinking and questioning. Every question is excellent, and I have also benefited a lot from reading the comments myself. Also, I hope we can form a Redis learning group. Please continue to express your thoughts in the comment section in the upcoming lessons. Let’s progress together, and I hope that everyone can become a Redis expert!