07 When Master Database Crashes How to Ensure Uninterrupted Service

07 When Master Database Crashes How to Ensure Uninterrupted Service #

In the previous lesson, we learned about the master-slave cluster mode. In this mode, if a slave node fails, the client can continue to send requests to the master node or other slave nodes to perform relevant operations. However, if the master node fails, it directly affects the synchronization of the slave nodes because there is no corresponding master node available for data replication.

Furthermore, if the client only sends read operation requests, the service can still be provided by the slave nodes, which is acceptable in scenarios where only read operations are involved. However, once there are write operation requests, according to the requirements of read-write separation in the master-slave mode, the write operations need to be performed by the master node. At this point, there is no instance available to serve the client’s write operation requests, as shown in the following figure:

Inability of the slave nodes to serve write operations after the master node fails

Both the interruption of write services and the inability of the slave nodes to perform data synchronization are unacceptable. Therefore, if the master node goes down, we need to run a new master node, for example, by switching a slave node to become the master node and treating it as such. This involves three questions:

  1. Is the master node really down?
  2. Which slave node should be selected as the new master node?
  3. How to notify the slave nodes and clients about the relevant information of the new master node?

This brings us to the Sentinel mechanism. In a Redis master-slave cluster, the Sentinel mechanism is the key mechanism for automatic failover of the master-slave nodes. It effectively addresses these three problems associated with failover in the master-slave replication mode.

Next, let’s learn about the Sentinel mechanism together.

Basic Process of Sentinel Mechanism #

Sentinel is actually a Redis process running in a special mode. It runs concurrently with the master and slave instances. Sentinel is mainly responsible for three tasks: monitoring, leader election (selecting a master), and notification.

Let’s start with monitoring. Monitoring means that the Sentinel process periodically sends PING commands to all master and slave instances to check if they are still running online. If a slave does not respond to the Sentinel’s PING command within a specified time, the Sentinel marks it as “offline”. Similarly, if the master does not respond to the Sentinel’s PING command within a specified time, the Sentinel determines that the master is offline, and then initiates the process of automatically switching the master.

The first step in this process is to perform the second task of the Sentinel, which is leader election. After the master fails, the Sentinel needs to select a slave instance from multiple slaves according to certain rules and make it the new master. After this step is completed, the current cluster will have a new master.

Then, the Sentinel will perform the last task: notification. When performing the notification task, the Sentinel sends the connection information of the new master to the other slaves, instructing them to execute the replicaof command and establish a connection with the new master for data replication. At the same time, the Sentinel notifies the clients of the connection information of the new master, instructing them to send their requests to the new master.

I have created a diagram showing these three tasks and their respective goals.

Three tasks and their respective goals of the Sentinel mechanism

Among these three tasks, the notification task is relatively simple. The Sentinel only needs to send the information of the new master to the slaves and clients, instructing them to establish a connection with the new master, without involving decision-making logic. However, in the monitoring and leader election tasks, the Sentinel needs to make two decisions:

  • In the monitoring task, the Sentinel needs to determine if the master is in the offline state.
  • In the leader election task, the Sentinel also needs to decide which slave instance to select as the master.

Next, let’s talk about how to determine the offline state of the master.

Firstly, you need to know that the Sentinel has two types of offline judgment for the master: “subjective offline” and “objective offline”. So why are there two types of judgments? What are their differences and connections?

Subjective Offline and Objective Offline #

Let me first explain what “subjective offline” means.

Sentinel processes use the PING command to check their own network connection and the connection to the master and slave databases in order to determine the status of the instance. If a sentinel process detects a timeout response from the master or slave for the PING command, it will mark it as “subjectively offline”.

If a slave is being checked, the sentinel simply marks it as “subjectively offline” because the offline of a slave generally has little impact and the cluster’s external service will not be interrupted.

However, if the master is being checked, the sentinel cannot simply mark it as “subjectively offline” and initiate a master-slave switch. This is because it is very likely that the sentinel made a wrong judgment and the master is not actually faulty. But once the master-slave switch is initiated, subsequent leader selection and notification operations will introduce additional computational and communication overhead.

To avoid these unnecessary overheads, we need to pay special attention to the possibility of misjudgment.

Firstly, we need to understand what misjudgment means. It simply means that the master is not actually offline, but the sentinel mistakenly believes that it is offline. Misjudgment often occurs when there is high network pressure or congestion, or when the master itself is under high load.

Once the sentinel determines that the master is offline, it will start selecting a new master and synchronize data between the slaves and the new master. This process itself incurs overhead, for example, the sentinel needs time to select a new master, and the slaves also need time to synchronize with the new master. In the case of misjudgment, the master itself does not need to switch, so the overhead of this process is pointless. It is because of this that we need to determine if there is a misjudgment and reduce the possibility of misjudgment.

So how do we reduce misjudgment? In daily life, when we need to make judgments about important matters, we often consult with family or friends before making a decision.

The sentinel mechanism is similar. It usually adopts a cluster mode composed of multiple instances, which is also called a sentinel cluster. Introducing multiple sentinel instances to make judgments can avoid situations where a single sentinel mistakenly judges the master as offline due to its own network conditions. At the same time, the probability of multiple sentinels having unstable networks at the same time is small, and by making decisions together, the misjudgment rate can be reduced.

In this lesson, all you need to understand is the role of the sentinel cluster in reducing misjudgment. The specific operating mechanism will be focused on in the next lesson.

When determining whether the master is offline, it cannot be solely determined by one sentinel. Only when the majority of sentinel instances judge that the master is “subjectively offline” will the master be marked as “objectively offline”, which indicates that the offline of the master has become an objective fact. The principle for this judgment is “majority rules”. At the same time, this will further trigger the sentinel to start the master-slave switching process.

To help you understand, let me draw another diagram to show the logic here.

As shown in the diagram below, the Redis master-slave cluster has one master, three slaves, and three sentinel instances. On the left side of the image, Sentinel 2 determines that the master is “subjectively offline”, but Sentinel 1 and 3 determine that the master is online. At this time, the master continues to be judged as online. On the right side of the image, both Sentinel 1 and 2 determine that the master is “subjectively offline”, even if Sentinel 3 still determines that the master is online, the master will be marked as “objectively offline”.

offline judgment

Criteria for Objective Offline

Simply put, the standard for “objective offline” is that when there are N sentinel instances, it is best to have N/2 + 1 instances judging that the master is “subjectively offline” in order to finally determine that the master is “objectively offline”. In this way, the probability of misjudgment can be reduced and the unnecessary master-slave switching caused by misjudgment can be avoided. (Of course, the number of instances that make the judgment of “subjective offline” can be determined by the Redis administrator).

So far, you can see that with the joint judgment mechanism of multiple sentinel instances, we can more accurately determine whether the master is in an offline state. If the master is indeed offline, the sentinel will start the next decision-making process, which is to select a slave from the many slaves to become the new master.

How to select a new master? #

Generally speaking, I refer to the process of sentinel selecting a new master as “filtering + scoring”. Simply put, we first use certain filtering conditions to eliminate the ineligible slave nodes from multiple replicas. Then, we score the remaining slave nodes one by one according to certain rules and select the slave node with the highest score as the new master, as shown in the following figure:

Selection process of a new master

Selection process for a new master

In the previous paragraph, it is important to note the two “certain” conditions. Now, let’s consider what “certain” specifically refers to here.

Let’s start with the filtering conditions.

In general, we need to ensure that the selected slave nodes are still running online. However, when selecting a master, the fact that a slave node is online only indicates that the current state of the slave node is good and does not necessarily mean that it is the most suitable to be the master.

Imagine that when selecting a master, one of the slave nodes is running normally and we start using it as the new master. However, shortly after, the network of this slave node malfunctions. At this point, we have to select a new master again. Obviously, this is not the result we expected.

Therefore, when selecting a master, in addition to checking the current online status of the slave node, we also need to determine its previous network connection status. If a slave node is frequently disconnected from the master node and if the number of disconnections exceeds a certain threshold, we have reason to believe that the network condition of this slave node is not good, and we can exclude it.

How do we determine this specifically? You can use the configuration option down-after-milliseconds * 10. Here, down-after-milliseconds is the maximum connection timeout we consider for determining the disconnection between the master and slave nodes. If both the master and slave nodes fail to establish a network connection within down-after-milliseconds milliseconds, we can consider them to be disconnected. If the number of disconnections exceeds 10 times, it indicates that the network condition of this slave node is not good and it is not suitable to be the new master.

Alright, this way we have filtered out the slave nodes that are not suitable to be the master, completing the filtering process.

Next, we need to score the remaining slave nodes. We can conduct three rounds of scoring based on three rules, namely, slave node priority, replication progress, and slave node ID. As long as a slave node scores the highest in any round, it becomes the master and the selection process is completed. If the highest score does not appear in any round, we continue to the next round.

First round: The slave node with the highest priority scores the highest.

Users can set different priorities for different slave nodes through the slave-priority configuration option. For example, if you have two slave nodes with different memory sizes, you can manually set a higher priority for the instance with more memory. When selecting a master, the sentinel will give a high score to the slave node with a higher priority. If there is a slave node with the highest priority, it becomes the new master. If the slave nodes have the same priority, the sentinel proceeds to the second round of scoring.

Second round: The slave node with the closest synchronization to the old master scores the highest.

The basis for this rule is that if we select the slave node with the closest synchronization to the old master as the new master, the new master will have the most up-to-date data.

How do we judge the synchronization progress between the slave node and the old master?

In the previous lesson, I introduced to you the command propagation process during master-slave replication. In this process, the master node uses master_repl_offset to record the current position of the latest write operation in repl_backlog_buffer, while the slave node uses slave_repl_offset to record the current replication progress.

At this point, what we are looking for is a slave node whose slave_repl_offset is closest to master_repl_offset. If among all the slave nodes, there is a slave node whose slave_repl_offset is closest to master_repl_offset, it scores the highest and becomes the new master.

As shown in the following figure, the master_repl_offset of the old master is 1000, and the slave_repl_offsets of Slave 1, Slave 2, and Slave 3 are 950, 990, and 900 respectively. Therefore, Slave 2 should be selected as the new master.

Principle of selecting the new master based on replication progress

Principle of selecting the new master based on replication progress

Of course, if two slave nodes have the same slave_repl_offset value (for example, Slave 1 and Slave 2 both have a slave_repl_offset value of 990), we need to proceed to the third round of scoring.

Third round: The slave node with the smallest ID scores the highest.

Each instance has an ID, which is similar to the numbering of slave nodes here. Currently, Redis has a default rule when selecting a master: in the case of the same priority and replication progress, the slave node with the smallest ID scores the highest and becomes the new master.

At this point, the new master has been selected, and the “master selection” process is complete.

Let’s review this process. First, the sentinel filters out some slave nodes that do not meet the requirements based on their online status and network status, and then scores the remaining slave nodes based on their priority, replication progress, and ID. As long as a slave node with the highest score appears, it becomes the new master.

Summary #

In this lesson, we learned about the sentinel mechanism, which is an important guarantee for continuous service in Redis. Specifically, data synchronization in a master-slave cluster is the foundation for reliable data, and automatic master-slave switching is the key support for uninterrupted service when the master fails.

The sentinel mechanism in Redis automatically completes the following three major functions, thus achieving automatic switching between master and slave libraries and reducing the operational overhead of the Redis cluster:

  • Monitoring the running status of the master and determining if it is objectively offline;
  • Selecting a new master after the master is objectively offline;
  • Notifying the slave libraries and clients after a new master is selected.

In order to reduce the misjudgment rate, the sentinel mechanism is usually deployed in multiple instances in practical applications. Multiple sentinel instances use the principle of “majority rules for the minority” to determine if the master is objectively offline. Generally, we can deploy three sentinels, and if two sentinels consider the master as “subjectively offline,” the switch process can begin. Of course, if you want to further improve the accuracy of judgment, you can also increase the number of sentinels appropriately, for example, use five sentinels.

However, using multiple sentinel instances to reduce the misjudgment rate actually forms a sentinel cluster, and we will face some new challenges, such as:

  • What should we do if an instance in the sentinel cluster goes down? Will it affect the determination of the master’s status and the selection of a new master?
  • When the majority of instances in the sentinel cluster reach a consensus and determine that the master is “objectively offline,” which instance will perform the master-slave switch?

To understand these issues, we have to discuss the mechanism and problems of the sentinel cluster in the next lesson.

One Question per Lesson #

As usual, I have a small question for you. In this lesson, I mentioned that the automatic switching of master and slave databases can be achieved through the sentinel mechanism, which is a key support for uninterrupted service. At the same time, I also mentioned that the switching of master and slave databases takes a certain amount of time. So, please consider whether the client can perform normal request operations during this switching process. If you want the application to be unaware of the interruption in service, do you need a sentinel or do you need the client to do something else?

Feel free to communicate and discuss with me in the comments section. I also welcome you to share today’s content with more people and help them solve problems together. See you in the next lesson.