04 Data Replication How to Ensure High Availability of Data in Distributed Scenarios

04 Data Replication - How to Ensure High Availability of Data in Distributed Scenarios #

In our previous lecture, we introduced sharding technology, which primarily aims to increase data capacity and performance. In this lecture, we will explore another important foundation of distributed databases: replication.

The main purpose of replication is to maintain copies of the same data on multiple database nodes, providing data redundancy. This redundant data can improve data query performance and, more importantly, ensure database availability.

This lecture mainly introduces two replication modes: single-master replication and multi-master replication, using the evolution of MySQL replication technology as an example.

Let’s start with learning about single-master replication, which not only covers the technology itself but also addresses topics in the replication field, such as replication delay, high availability, and replication modes.

Single-master Replication #

Single-master replication, also known as master-slave replication, involves replicating the data written to the master node to the slave nodes, which store copies of the database. When clients want to write to the database, they must send their requests to the master node, which converts the data into replication logs or modification data streams and sends them to all the slave nodes. From the perspective of the users, the slave nodes are read-only. The following diagram illustrates the classic master-slave replication architecture.

Drawing 0.png

This mode is the earliest developed replication mode and is widely used not only in traditional databases such as PostgreSQL, MySQL, Oracle, and SQL Server but also in some distributed databases such as MongoDB, RethinkDB, and Redis.

Next, we will delve into the concept from several aspects: replication synchronization modes, replication delay, replication and high availability, and replication modes.

Replication Synchronization Modes #

Replication is a time-consuming and unpredictablly completed operation. Although there are many factors that can affect replication, whether a replication operation occurs synchronously or asynchronously is considered a crucial influencing factor. We can analyze it from the following three points.

Synchronous replication: If there is no response from the slave nodes due to a crash, network failure, or other reasons, the master node will also fail to write the data.
Semi-synchronous replication: Some of the slave nodes perform synchronous replication while others perform asynchronous replication. In other words, if one of the slave nodes confirms replication, the master node can write the data.
Asynchronous replication: The master node can write the data regardless of the replication status of the slave nodes. However, if the master node fails, the data not yet replicated to the slave nodes will be lost.

Different synchronization modes balance performance and consistency, and the three modes are suitable for different scenarios. Users need to set different synchronization modes based on their own business scenarios.

Replication Delay #

If we want to improve the query capacity of the database, the simplest way is to add enough slave nodes to the database cluster. These slave nodes are read-only nodes, so the query requests can be well distributed among these nodes.

However, if synchronous replication is used, each write operation needs to be replicated to all the slave nodes, which can cause a situation where some slave nodes already have the data while the master node has not written the data yet. On the other hand, asynchronous replication can result in slave nodes having outdated data.

These issues are referred to as “replication delay.” In general materials, you may come across terms like “write-after-read” and “read-single-increment” to address replication delay. However, these concepts actually fall under the category of data consistency models, which I will delve into in the next lecture.

Replication and High Availability #

High availability is an IT term that refers to the ability of a system to perform its functions without interruption. Any node in the system may experience unplanned downtime due to various unexpected failures, and we also need some planned downtime to maintain the system. Adopting the master-slave mode for databases can prevent a decrease in availability caused by the failure of a single node.

The degree of availability is generally expressed in the form of multiple nines after the decimal point, as shown in the table below.

Availability	Annual Downtime
99.9999%	32 seconds
99.999%	5 minutes 15 seconds
99.99%	52 minutes 34 seconds
99.9%	8 hours 46 minutes
99%	3 days 15 hours 36 minutes

Typically, production systems guarantee at least two nines and strive for three nines. Achieving four nines is very challenging. In the master-slave mode, fault handling is necessary to support high availability. Here, I summarize two possible faults and their handling solutions.

Slave node failure: Since each node replicates the data change log received from the master node, it knows the last transaction processed before the failure. It can use this information to recover its data from the master node or other slave nodes.
Master node failure: In this case, a new master node needs to be selected from the slave nodes. This process is called failover and can be triggered manually or automatically. The typical steps are: (1) Determine the offline master node based on a timeout period; (2) Select a new master node, noting that the new master node should generally be closest to the old master’s data; (3) Reset the system to make it the new master node.

Replication Modes #

To replicate data flexibly and efficiently, I will introduce several commonly used replication modes.

1. Statement-based Replication

The master node records each write request it executes (generally saved in the form of SQL statements), and each slave node parses and executes the statement as if it received it from the client. However, this replication mode may have potential issues, such as different values being generated on different data nodes if the statement uses functions to get the current time.

Additionally, unexpected issues could arise after replication for features like auto-increment columns, triggers, stored procedures, and functions. These issues can be mitigated through preprocessing. Distributed databases that use this replication mode include VoltDB and Calvin.

2. Write-Ahead Log (WAL) Synchronization

WAL is a sequence of bytes containing all write operations to the database. Its content consists of low-level operations, such as writing a block of binary data to a page of a disk. The master node sends such data to the slave node over the network.

This method avoids some side effects caused by certain operations in statements during replication. However, it requires complete consistency in the database engines of the master and slave nodes, preferably with matching versions. If the version of the slave node needs to be upgraded, unexpected downtime will be required. This method is used in PostgreSQL and Oracle.

3. Row-based Replication

It consists of a series of records describing write operations on database tables at the row level. It is decoupled from specific storage engines, and third-party applications can easily parse its data format.

4. Extract, Transform, Load (ETL) Tools

This feature generally provides the most flexible replication method. Users can design the replication scope and mechanism based on their own business requirements. Filtering, transforming, and compressing operations can be performed during replication. However, it generally has lower performance and is suitable for scenarios involving sub-datasets.

That’s all for single-master replication. Now let’s talk about multi-master replication.

Multi-Master Replication #

Also known as master-master replication, a database cluster has multiple equal master nodes that can accept writes concurrently. Each master node also acts as a slave node to other master nodes.

The architecture pattern of multi-master nodes originated from distributed databases like DistributedSQL, which span multiple data centers across different regions. In such clusters where the physical space is far apart and involves multiple data centers, each data center has one master node. Within each data center, a regular single-master replication mode is adopted.

The design aim of such systems is to achieve the following:

Improved write performance: Data can be written locally.
Data center-level high availability: Each data center can continue to operate independently of others.
Improved data access performance: Users can access the data center closest to them.

However, the biggest drawback of this method is the possibility of two different master nodes simultaneously modifying the same data. This is a risky operation that should be avoided as much as possible. This requires a consistency model, along with conflict resolution mechanisms we will discuss in the next lecture, to avoid such cases.

There is also a case of handling consistency issues when clients are offline. To improve performance, database clients often cache a certain amount of write operations and then send them to the server in batches. This situation is similar to using collaborative office document tools. In this case, each client can be considered as having the properties of a local database with master node attributes, and there is an asynchronous multi-master replication process among multiple clients. This requires the database to coordinate write operations and handle possible data conflicts.

Typical multi-master replication products include MySQL’s Tungsten Replicator, PostgreSQL’s BDR, and Oracle’s GoldenGate.

Currently, most NewSQL and DistributedSQL distributed databases support multi-master replication. However, many of them use protocols like Paxos or Raft to construct replication groups to ensure linear consistency or sequential consistency in writes. Traditional databases like MySQL, with its MGR solution, also employ similar methods. It can be seen that this approach is the development direction for multi-master replication. The content of consistency protocols will be detailed in upcoming courses.

The historical trend has evolved from single-master replication to multi-master replication. Above, we abstractly summarize the development patterns of replication and the technology points that need attention. Next, I will visually present the development path of MySQL high availability technology to show the evolution of database replication technology.

Development of MySQL replication technology #

Due to the limitations of its single-node performance, MySQL has developed data replication technology early on to improve its performance. At the same time, with the help of this technology, MySQL’s availability has also made significant progress.

Up until now, this technology has gone through four generations of development. The first generation is the traditional replication, which uses the MHA (Master High Available) architecture. The second generation is GTID-based replication, specifically the GTID+Binlog server model. The third generation is enhanced semi-synchronous replication, GTID+enhanced semi-synchronous replication. The fourth generation is the native high availability of MySQL, known as MySQL InnoDB Cluster.

Database replication technology needs to consider two factors: data consistency RPO and business continuity RTO. As emphasized in the previous content, replication and consistency are inseparable concepts. This lecture focuses on replication but also mentions concepts related to consistency.

Now let’s start with the first generation of replication technology.

MHA replication control #

The following diagram illustrates the MHA architecture.

Drawing 1.png

MHA, as the first generation replication architecture, is suitable for the following scenarios:

MySQL version ≤5.5, which indicates its antiquity.
Only used for asynchronous replication in a master-slave environment.
High availability based on traditional replication.

MHA does its best to compensate for data, but there is no guarantee of success. It also strives to achieve RPO and has RTO support. As can be seen, it is merely an auxiliary tool with no guarantee for RPO and RTO in terms of its own architecture and mechanism.

Therefore, it has several issues:

Its GTID model relies heavily on binlog server, but it cannot recognize binlog 5.7 or later and does not provide good support for parallel replication.
The service IP switch depends on self-written scripts or can be combined with DNS. The effectiveness of its operation and maintenance depends on the experience of the operation and maintenance personnel.
SSH trust, switch judgment, and other manual operations are required for maintenance. Overall, it is in a state of low automation and high maintenance difficulty.
It is now basically no longer maintained.

From the above issues, we can see that MHA, as the first generation replication architecture, has relatively primitive functionality, but it has paved the way for the development of replication technology, especially in terms of GTID and binlog applications. However, unless maintaining relatively old MySQL clusters, it is not recommended to use it at present.

Semi-synchronous replication #

This is the second generation replication technology, and its differences from the first generation technology are reflected in the following points.

The binlog uses semi-synchronous replication, while the first generation uses asynchronous replication. It ensures data security and usually synchronizes with at least two nodes to ensure data RPO.
Asynchronous replication is also retained to ensure replication performance. And by monitoring the replication delay, it ensures RTO.
The introduction of a configuration center, such as Consul, provides healthy MySQL services externally.
This generation begins to support cross-IDC replication. It requires the introduction of a monitoring tool like Monitor, combined with Consul registration center. Monitors in multiple IDCs form a distributed monitoring system, registering healthy MySQL in Consul, and synchronizing replication latency of slave databases to Consul as well.

The following diagram shows the semi-synchronous replication architecture with Consul registration center and monitoring module.

Drawing 2.png

The second-generation replication technology also has its own shortcomings.

There can be phantom reads. When a transaction is synchronized to a slave but has not received an ACK, the master crashes. At this time, the master does not have the transaction, but the slave does.
MySQL 5.6 itself has a semi-synchronous ACK confirmation in the dump_thread, which has an IO bottleneck.

Based on this, the third generation replication technology was born.

Enhanced Semi-Synchronous Replication #

This generation requires MySQL version 5.7 or later. There are some typical frameworks to support this technology, such as MySQL Replication Manager, GitHub-orchestrator, and Xenon from QingCloud.

This generation of replication technology uses enhanced semi-synchronous replication. First, replication from the master to the slave is performed using independent threads. Second, the master uses binlog group commit to provide write performance to the database. And the slave uses parallel replication, which is based on transactions and adjusts the number of threads to improve performance. This way, both the master and the slave can run in parallel.

This generation of technology relies heavily on enhanced semi-synchronous replication to ensure RPO, while RTO depends on replication delay.

Let’s take Xenon as an example to illustrate, as shown in the figure below (image from the official website).

Drawing 3.png

From the figure, we can see that each node has an independent agent, which uses raft to build a consensus cluster and uses GTID for master election. Then the master node provides write services, and the slave nodes provide read services.

When the master node fails, the agent will detect the failure through a ping. Due to the support of GTID and enhanced semi-synchronous replication, the data on the slave nodes is consistent with that on the master node. Therefore, it is easy to promote a slave node to become the new master node.

The third generation technology also has its own drawbacks, such as the presence of ghost transactions in enhanced semi-synchronous replication. This is because after the data is written to the binlog, the master server loses power. Since the recovery process requires recovering from the binlog, the data remains on the master server. However, if it has not been synchronized to the slave servers, it will prevent them from switching to the master role, and they can only attempt to recover the original crashed master server.

MySQL Group Replication #

Group replication is an important component of the new generation high availability technology provided by MySQL. When combined with MySQL Router or Proxy, it can achieve native high availability.

Starting from this generation, MySQL supports multi-master replication while retaining the functionality of single-master replication. The principle of single-master high availability is similar to that of the third-generation technology, which we will not further analyze here.

Now let’s talk about its multi-master mode. The principle is to use MySQL Router as the data routing layer to control read-write separation. And within the group, Paxos algorithm is used to achieve consistent writes.

Its role is different from the consistency algorithm used in the third-generation replication technology. In the third generation, we only use the algorithm for leader election operations, and data writes are not included. The multi-master technology of group replication requires deep involvement of the Paxos algorithm to decide each write and resolve write conflicts.

Group replication has the following advantages:

High availability sharding: dynamic addition and removal of database nodes. Sharding achieves write scaling, with each shard being a replication group. This is similar to what was introduced in the previous lesson about TiDB.
Automated failure detection and fault tolerance: if a node is unresponsive, the majority of members within the group consider the node unhealthy and isolate it automatically.
Complete solution: the previously introduced solutions require MySQL to be combined with a series of third-party solutions, while group replication is a native and complete solution that doesn’t require third-party components.

Of course, group replication also has some limitations. Mainly, it requires the use of newer features, some functions are not supported in multi-group replication, and lack of experience among operations personnel, among others.

With the development of MySQL, it is believed that more and more systems will be migrated to group replication, and the multi-master mode will gradually replace the single-master mode.

Summary #

This concludes this lesson. We have delved into the role of replication technology in distributed databases, discussed the two replication technologies of single-master and multi-master, and introduced the application scenarios of replication technology through the development path of MySQL replication technology.

As I mentioned above, replication is often discussed together with consistency. This lesson focused on replication, and in the next lesson, we will explore consistency issues in detail, including CAP theory, consistency models, and their combination with replication.

Learning from Each Other #

Here is a thought-provoking question for you: we often hear about a technology called “masterless replication”. How does it differ from the two replication technologies we introduced in this lesson?