37 Practical Redis Sentinel Mode Part 1

37 Practical Redis Sentinel Mode Part 1 #

In the previous article, we discussed the master-slave replication mode, which is the foundation of running Redis on multiple machines. However, this mode has a fatal problem. When the master node crashes, manual intervention is required to restore Redis to normal operation.

For example, let’s say we have three servers set up for master-slave replication: one master server A and two slave servers B and C. When server A fails, we need to manually set server B as the new master and set server C as a slave that syncs data from the new master. If this happens at night or when there are many slave nodes, it can be difficult for humans to immediately recover the system. Therefore, we need an automated tool called Redis Sentinel to transform this manual process into an automatic one, enabling Redis to have automatic failover capabilities.

The Redis Sentinel mode is illustrated as follows:

Sentinel mode.png

Tip: The minimum deployment unit for Redis Sentinel is one master and one slave.

Setting up Redis Sentinel #

Redis provides the functionality of Redis Sentinel. Its executable program is stored in the src directory, as shown below:

image.png

To start Sentinel, we need to use the command ./src/redis-sentinel sentinel.conf. As you can see, we need to set a sentinel.conf file when starting it. This configuration file must include the information of the master node being monitored:

sentinel monitor master-name ip port quorum

For example:

sentinel monitor mymaster 127.0.0.1 6379 1

Where:

  • master-name is the name given to the monitored master node.
  • ip represents the IP address of the master node.
  • port represents the port of the master node.
  • quorum represents the number of Sentinels that need to agree on the master node being offline. If quorum is set to 1, it means that as long as one Sentinel determines that the master is offline, it can be considered as confirmed.

Note: If the master Redis server has a password, you must also add the password of the master node in the sentinel.conf file. Otherwise, Sentinel will not be able to automatically monitor the slave nodes under the master node.

So if Redis has a password, the sentinel.conf file must include the following content:

sentinel monitor mymaster 127.0.0.1 6379 1
sentinel auth-pass mymaster pwd654321

When we have configured sentinel.conf and executed the start command ./src/redis-sentinel sentinel.conf, Redis Sentinel will be started, as shown in the following screenshot:

image.png

From the above screenshot, you can see that Sentinel only needs to configure the information of the master node to automatically monitor the corresponding slave nodes.

Starting a Sentinel Cluster #

In the previous section, we demonstrated the startup of a single Sentinel. However, in a production environment, we will not only start a single Sentinel, because if a single Sentinel crashes, it will not be able to provide automatic failover services, which does not meet the requirements of high availability. Therefore, we will start multiple Sentinels on different physical machines to form a Sentinel cluster, ensuring the high availability of Redis services.

Starting a Sentinel cluster is as simple as starting a single one. We just need to make multiple Sentinels monitor the same master server node. Then the multiple Sentinels will automatically discover each other and form a Sentinel cluster.

Let’s start the second Sentinel and see the result:

[@iZ2ze0nc5n41zomzyqtksmZ:redis2]$ ./src/redis-sentinel sentinel.conf
5547:X 19 Feb 2020 20:29:30.047 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
5547:X 19 Feb 2020 20:29:30.047 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=5547, just started
5547:X 19 Feb 2020 20:29:30.047 # Configuration loaded
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 5.0.5 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in sentinel mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 26377
 |    `-._   `._    /     _.-'    |     PID: 5547
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

5547:X 19 Feb 2020 20:29:30.049 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
5547:X 19 Feb 2020 20:29:30.049 # Sentinel ID is 6455f2f74614a71ce0a63398b2e48d6cd1cf0d06
5547:X 19 Feb 2020 20:29:30.049 # +monitor master mymaster 127.0.0.1 6379 quorum 1
5547:X 19 Feb 2020 20:29:30.049 * +slave slave 127.0.0.1:6377 127.0.0.1 6377 @ mymaster 127.0.0.1 6379
5547:X 19 Feb 2020 20:29:30.052 * +slave slave 127.0.0.1:6378 127.0.0.1 6378 @ mymaster 127.0.0.1 6379
5547:X 19 Feb 2020 20:29:30.345 * +sentinel sentinel 6455f2f74614a71ce0a63398b2e48d6cd1cf0d08 127.0.0.1 26379 @ mymaster 127.0.0.1 6379

As shown in the result above, we started the second Sentinel. From the above startup commands, we can see that there is an additional command in cluster mode that detects other Sentinel servers, indicating that these two Sentinels have formed a cluster.

The Sentinel cluster diagram is as follows:

哨兵模式-多哨兵.png

In general, the number of Sentinel nodes in a cluster should be an odd number greater than 1, such as 3, 5, 7, or 9. The configuration of the quorum parameter should vary according to the number of Sentinel nodes. For example, if there are 3 Sentinel nodes, the corresponding quorum should be set to 2. If there are 5 Sentinel nodes, the quorum should be set to 3. It means that when 3 Sentinel nodes agree that the master node is offline, it can be determined that the master node is indeed offline.

There are two concepts related to the quorum parameter: subjective down and objective down.

When a Sentinel node in the cluster considers the master server to be offline, it marks the master server as subjectively down (SDOWN), and then asks other Sentinel nodes in the cluster whether they also consider the server to be offline. When the number of Sentinel nodes that agree on the master server being offline reaches the quantity specified by the quorum parameter, the Sentinel node will mark the respective master server as objectively down (ODOWN) and then initiate the failover process.

Automatic failover test #

Earlier, we have set up Redis Sentinel, and now we will try the automatic failover feature. In order to simulate a failure, let’s manually kill the master node. Execute the following command:

[@iZ2ze0nc5n41zomzyqtksmZ:~]$ ps -ef|grep redis # Find the process ID of the master node
root      5186     1  0 16:54 ?        00:00:23 ./src/redis-server *:6377
root      5200     1  0 16:56 ?        00:00:22 ./src/redis-server *:6378
root      5304  5287  0 17:31 pts/2    00:00:00 redis-cli -a pwd654321
root      5395  5255  0 18:26 pts/1    00:00:19 ./src/redis-sentinel *:26379 [sentinel]
root      5547  5478  0 20:29 pts/4    00:00:02 ./src/redis-sentinel *:26377 [sentinel]
root      5551  5517  0 20:29 pts/5    00:00:00 redis-cli -h 127.0.0.1 -p 26377 -a pwd654321
root      5568  5371  0 20:48 pts/0    00:00:00 grep --color=auto redis
root     28517     1  0 Feb13 ?        00:15:33 ./src/redis-server *:6379
[@iZ2ze0nc5n41zomzyqtksmZ:~]$ kill -9 28517 # Shut down the master node service

At this point, let’s connect to another Redis server and check the current master-slave server information. Execute the following command:

[@iZ2ze0nc5n41zomzyqtksmZ:~]$ redis-cli -h 127.0.0.1 -p 6377 -a pwd654321 2>/dev/null
127.0.0.1:6377> role
1) "master"
2) (integer) 770389
3) 1) 1) "127.0.0.1"
      2) "6378"
      3) "770389"

We can see that the previous slave server 6377 has been promoted to a master server, and there is still one remaining slave server 6378. The previous master server 6379 has been manually taken offline. This demonstrates that Sentinel has successfully completed its automatic failover task.

Rules for Master Node Election #

Above, we simulated Redis Sentinel’s automatic failover. Now, let’s take a look at the rules for electing a new master server and the related configurations.

Setting the Priority for Electing a New Master Node #

We can set the priority for electing a new master node using the replica-priority option in the redis.conf file. Its default value is 100, and its maximum value is also 100. The smaller the value, the higher the weight. For example, if replica-node A has a replica-priority value of 100, replica-node B has a value of 50, and replica-node C has a value of 5, then replica-node C will be selected as the new master during the election.

Rules for Electing a New Master Node #

The election of a new master node involves excluding ineligible replica-node candidates and then selecting from the remaining replica-nodes based on priority. First, the following conditions will exclude a replica-node candidate:

  1. Exclude all suspected offline replica-servers that have not responded to heartbeat checks for a long time.
  2. Exclude all replica-servers that have not communicated with the master server for a long time and have outdated data.
  3. Exclude all servers with a priority (replica-priority) of 0.

Order of election for eligible replica-nodes:

  1. The replica-node with the highest priority will be selected as the new master.
  2. If the priorities are equal, the replica-node with the highest replication offset will win.
  3. If the above two conditions are the same, the replica-node with the smallest randomly generated Redis ID will become the new master server.

Recovery of the Old Master Node #

If the previous master node recovers and comes back online, it will function as a replica-node in the master-slave server mode.

How Sentinel Works #

The work principle of Sentinel is as follows: Each Sentinel instance sends a PING command to known master servers, replica-servers, and other Sentinel instances at a frequency of 1 time per second.

If the time elapsed since the last valid reply to the PING command exceeds the value set by down-after-milliseconds (default is 30s), the instance will be marked as having a subjective down status by the Sentinel.

If a master server is marked as having a subjective down status, all Sentinel nodes monitoring this master server will confirm this status at a frequency of 1 time per second.

If a quorum (configured by quorum value) of Sentinels agrees on this judgment within the specified time range, the master server will be marked as having an objective down status. At this point, all Sentinels will automatically select a new master node according to the rules.

Note: A valid PING reply can be one of the following: “+PONG”, “-LOADING”, or “-MASTERDOWN”. If the reply is none of the above three types or if no reply to the PING command is received within the specified time, Sentinel considers the reply from the server as invalid (non-valid).

Summary #

In this article, we discussed the steps of the master-slave mode and the shortcomings of manual switching of faulty servers. This led us to introduce the Sentinel mode, which can achieve monitoring and automatic failover. We can start the Sentinel mode using Redis-Sentinel provided by Redis. When we start multiple Sentinel instances monitoring the same master node, they will discover each other and form a new high-availability Sentinel network. We also explained that Sentinel works by checking the availability of nodes using the PING command, and it determines the new master node based on configuration settings and replication offset ID. In the next section, we will discuss Sentinel management commands and provide code examples.