26 Why Backup Databases Might Delay for Several Hours #

In the previous article, I introduced several possible reasons for backup delays. You will find that in these scenarios, whether it is occasional query pressure or backups, the impact on backup delay is usually on the order of minutes, and the backup can catch up once it returns to normal.

However, if the speed at which the backup executes logs is consistently slower than the speed at which the main database generates logs, then this delay can become hours. And for a main database under sustained high pressure, the backup is likely to never catch up with the pace of the main database.

This brings us to the topic I want to discuss today: parallel replication capability of the backup.

To help you understand, let’s take a look at the main-standby process diagram in the 24th article, “How Does MySQL Ensure Consistency Between the Main Database and Backup Database?”

Figure 1 Main-Standby Process Diagram

When it comes to the parallel replication capability of the backup, we need to pay attention to the two black arrows in the diagram. One arrow represents the client writing to the main database, and the other arrow represents the relay log being executed by the sql_thread on the backup. If we use the thickness of the arrows to represent the degree of parallelism, the actual situation is shown in Figure 1, where the first arrow is significantly thicker than the second arrow.

On the main database, the factor that affects concurrency is various locks. Due to the support of row-level locks by the InnoDB engine, except for the extreme scenario where all concurrent transactions update the same row (hot row), it is still very friendly in terms of supporting business concurrency. Therefore, you will find that in performance tests, the overall throughput is higher with 32 concurrent stress testing threads than with a single thread.

The execution of logs on the backup is the logic of updating data (DATA) by the sql_thread on the backup in the diagram. If it is done with a single thread, it will result in the backup not applying logs fast enough, causing main-standby delay.

Before the official release of version 5.6, MySQL only supported single-threaded replication, which resulted in serious main-standby delay issues when the main database had high concurrency and a high TPS.

The evolution from single-threaded replication to the latest version of multi-threaded replication went through several versions. Next, I will tell you about the evolution process of multi-threaded replication in MySQL.

In fact, all multi-threaded replication mechanisms are designed to split the sql_thread in Figure 1, which originally had only one thread, into multiple threads, which conform to the following model:

Figure 2 Multi-Threaded Model

In Figure 2, the coordinator is the original sql_thread, but now it no longer updates data directly, only reads the relay log and distributes transactions. The thread that actually updates the log is now the worker thread. The number of worker threads is determined by the slave_parallel_workers parameter. Based on my experience, it is best to set this value to between 8 and 16 (for a 32-core physical machine), as the backup may still need to provide read queries and cannot consume all the CPU resources.

Next, you need to consider a question: can transactions be distributed to workers in a round-robin manner, for example, the first transaction is assigned to worker_1, the second transaction is assigned to worker_2?

In fact, this is not possible. After transactions are distributed to workers, different workers execute independently. However, due to CPU scheduling strategies, it is very likely that the second transaction will be executed before the first transaction. And if at this time these two transactions update the same row, it means that the execution order of the two transactions on the main and backup databases is reversed, leading to inconsistent main-standby issues.

Next, please consider another question: can multiple update statements within the same transaction be executed by different workers?

The answer is also no. For example, if a transaction updates one row each in tables t1 and t2, if these two update statements are assigned to different workers, although the final result is consistent between the main and backup databases, if a query is executed on the backup at the moment when table t1 is completed, it will see that the transaction has “updated half of the result,” thus violating the isolation of transaction logic.

Therefore, when the coordinator distributes transactions, it needs to meet the following two basic requirements:

Cannot cause updates to be overwritten. This requires that two transactions updating the same row must be distributed to the same worker.
The same transaction cannot be split up and must be placed in the same worker.

Each version of multi-threaded replication follows these two basic principles. Next, let’s take a look at the parallel replication strategies of each version.

Parallel Replication Strategy in MySQL 5.5 #

The official MySQL 5.5 version does not support parallel replication. However, in 2012, there was a serious delay between the master and slave databases in the business I was serving. The reason was that the slave database only had single-threaded replication. Therefore, I wrote two versions of parallel strategies.

Here, let me introduce these two versions of parallel strategies, namely table-based distribution strategy and row-based distribution strategy, to help you understand the iterative process of the official MySQL version’s parallel replication strategy.

Table-based Distribution Strategy #

The basic idea of distributing transactions based on tables is that if two transactions update different tables, they can run in parallel. Since data is stored in tables, distributing based on tables can ensure that two workers do not update the same row.

Of course, if there are cross-table transactions, the two tables still need to be considered together. As shown in Figure 3, this is the rule of table-based distribution.

Figure 3. Table-based parallel replication model

As can be seen, each worker thread corresponds to a hash table, which is used to store the tables involved in the “execution queue” of transactions currently being processed by this worker. The key of the hash table is “database name. table name”, and the value is a number indicating how many transactions in the queue modify this table.

When a transaction is assigned to a worker, the tables involved in the transaction are added to the corresponding hash table. After the worker completes execution, the table is removed from the hash table.

In Figure 3, hash_table_1 indicates that there are 4 transactions involving the db1.t1 table in the “pending transaction queue” of worker_1, and 1 transaction involving the db2.t2 table. hash_table_2 indicates that there is one transaction in worker_2 that will update the data in the t3 table.

Assuming the following scenario in the figure, when the coordinator reads a new transaction T from the relay log, and the modified rows in this transaction involve tables t1 and t3.

Now let’s look at the allocation rules using the allocation flow of transaction T.

Since transaction T involves modifying table t1, and there is a transaction in the worker_1 queue modifying table t1, transaction T conflicts with a transaction in the queue of worker_1.
Following this logic, by sequentially checking the conflict relationship between transaction T and each worker queue, we find that transaction T also conflicts with worker_2.
Since transaction T conflicts with more than one worker, the coordinator thread enters a waiting state.
Each worker continues execution and modifies the hash_table at the same time. Assuming that the transaction involving the modification of table t3 in hash_table_2 completes execution first, the item db1.t3 will be removed from hash_table_2.
In this way, the coordinator will find that the only worker conflicting with transaction T is worker_1, so it assigns it to worker_1.
The coordinator continues to read the next relay log and allocate transactions. That is to say, when distributing each transaction, the conflict relationship with all workers includes the following three cases:
If there is no conflict with any worker, the coordinator thread will assign this transaction to the most idle worker.
If there is a conflict with more than one worker, the coordinator thread will enter a waiting state until there is only one worker left with a conflict relationship with this transaction.
If there is only one worker with a conflict, the coordinator thread will assign this transaction to that worker.

This table-based distribution scheme works well in scenarios where the workload of multiple tables is balanced. However, if a hot spot table is encountered, for example, when all update transactions involve a certain table, all transactions will be assigned to the same worker, resulting in single-threaded replication.

Row-based Distribution Strategy #

To solve the problem of parallel replication for hot spot tables, a row-based parallel replication scheme is needed. The core idea of row-based replication is that if two transactions do not update the same rows, they can be executed in parallel on the replica. Obviously, this pattern requires the binlog format to be row-based.

In this case, we determine the conflict between a transaction T and a worker not based on “modifying the same table”, but on “modifying the same row”.

The data structure for row-based replication is similar to table-based replication, where each worker is assigned a hash table. However, to implement row-based distribution, the key must be “database name + table name + the value of the unique key”.

However, having only the primary key ID as the “unique key” is not enough. We also need to consider the following scenario, where table t1 has a unique index a:

CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `a` (`a`)
) ENGINE=InnoDB;
 
insert into t1 values(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5);

Assuming that we are going to execute the following two transactions on the master:

Figure 4 - Example of unique key conflict

As can be seen, the two transactions update rows with different primary key values. However, if they are assigned to different workers, it is possible for Session B’s statement to be executed first. At this point, the value of a for the row with id=1 will still be 1, resulting in a unique key conflict.

Therefore, under the row-based strategy, the transaction hash table needs to consider the unique key, and the key should be “database name + table name + the name of index a + the value of a”.

For example, in the above example, if I execute the statement “update t1 set a=1 where id=2” on table t1, the binlog will record the values of each field of the row before and after the modification.

Therefore, when the coordinator parses the binlog of this statement, the hash table for this transaction will have three entries:

key=hash_func(db1+t1+“PRIMARY”+2), value=2; Here, value=2 because the value of the row id remains unchanged before and after the modification, so it appears twice.
key=hash_func(db1+t1+“a”+2), value=1, indicates that the row with a=2 in this table will be affected.
key=hash_func(db1+t1+“a”+1), value=1, indicates that the row with a=1 in this table will be affected.

As you can see, compared to the table-based parallel distribution strategy, the row-based parallel strategy requires more compute resources when determining thread distribution. You may also have noticed that both of these solutions have some constraints:

The table name, primary key value, and value of the unique index must be able to be parsed from the binlog. This means that the binlog format of the primary database must be row-based.
The table must have a primary key.
There must be no foreign keys. If there are foreign keys on the table, the rows updated by a cascade update will not be recorded in the binlog, making conflict detection inaccurate.

Fortunately, these three constraints were already part of the online usage specifications that DBAs required business developers to adhere to, so there were no major issues when applying these two parallel replication strategies.

Comparing the table-based and row-based distribution strategies, the row-based strategy has higher parallelism. However, the row-based strategy has two issues when dealing with large transactions that operate on many rows:

Memory consumption. For example, if a statement needs to delete 1 million rows of data, the hash table would need to record 1 million entries.
CPU consumption. Parsing the binlog, calculating the hash value, and performing these operations on a large transaction can be costly.

Therefore, when implementing this strategy, I would set a threshold. If a single transaction exceeds the set number of rows (for example, if the number of rows updated in a single transaction exceeds 100,000), it would temporarily degrade to single-threaded mode. The logic for the degrade process would be as follows:

The coordinator temporarily holds this transaction.
Wait for all workers to finish execution and become an empty queue.
The coordinator directly executes this transaction.
Restore parallel mode.

Reading up to this point, you may be wondering why I am introducing these two strategies in such detail when they have not been merged into the official release. In fact, the purpose of introducing these two strategies is to provide a basis for understanding the community version strategy that will be discussed later.

Parallel Replication Strategy in MySQL 5.6 #

In the official MySQL 5.6 version, parallel replication is supported, but the granularity of parallelism is based on databases. Understanding the table-based distribution strategy and the row-based distribution strategy described above, you will understand that the key used to determine the distribution strategy in the hash table is the database name.

The effectiveness of this strategy depends on the stress model. If there are multiple databases on the primary server and the workload of each database is balanced, this strategy will work well. Compared to distributing by table or by row, this strategy has two advantages:

It is fast to construct the hash value, only requiring the database name. Additionally, there are not many databases on a single instance, so there is no need to construct 1 million items.
It does not require a specific binlog format. With the statement format of the binlog, it is easy to obtain the database name.

However, if all tables on the master database are placed in the same database, this strategy is not effective. Or if the hotspots are different among different databases, for example, if one is for business logic and another is for system configuration, then the parallel effect cannot be achieved.

In theory, you can create different databases and evenly distribute tables with the same hotspots to these different databases, forcefully using this strategy. However, from what I know, this strategy is not widely used due to the need to specifically move data.

MariaDB’s parallel replication strategy #

In the [23rd article], I introduced the optimization of redo log group commit. MariaDB’s parallel replication strategy utilizes this feature:

Transactions that can be committed in the same group will not modify the same row.
Transactions that can be executed in parallel on the master can also be executed in parallel on the slave.

In the implementation, MariaDB does the following:

Transactions that are committed together in a group have the same commit_id. The next group will be commit_id + 1.
The commit_id is directly written to the binlog.
When applied to the slave, transactions with the same commit_id are distributed to multiple workers for execution.
After all transactions in this group are executed, the coordinator will fetch the next batch.

At that time, this strategy was quite impressive. Previously, the industry’s approach was focused on “analyzing the binlog and splitting it into workers”. However, MariaDB’s strategy aimed to “simulate the master’s parallel mode”.

However, this strategy has one issue, it does not achieve the goal of “truly simulating the parallelism of the master database”. On the master database, while one group of transactions is committing, the next group of transactions is already in the “executing” state.

As shown in Figure 5, assuming three groups of transactions are executing on the master, you can see that when trx1, trx2, and trx3 are committed, trx4, trx5, and trx6 are already executing. Thus, when the first group of transactions is fully committed, the next group of transactions quickly enters the commit state.

Figure 5 Master database parallel transactions

According to MariaDB’s parallel replication strategy, the execution on the slave is shown in Figure 6.

Figure 6 MariaDB parallel replication, parallel effect on the slave

As you can see, when executing on the slave, the second group of transactions can only start executing after the first group of transactions is fully executed. This reduces the throughput of the system.

In addition, this solution is easily hindered by large transactions. For example, if trx2 is a very large transaction, then when applied to the slave, the next group of transactions can only start executing after trx1 and trx3 are fully executed. During this time, only one worker thread is working, which is a waste of resources.

However, even so, this strategy is still a very innovative one. It requires minimal transformation of the original system and the implementation is elegant.

MySQL 5.7’s parallel replication strategy #

After the implementation of MariaDB’s parallel replication, the official MySQL 5.7 version also provides similar functionality, controlled by the parameter slave-parallel-type:

Configured as DATABASE, it means using the MySQL 5.6 version’s strategy of parallelism by database.
Configured as LOGICAL_CLOCK, it means a strategy similar to MariaDB. However, MySQL 5.7’s strategy optimizes for parallelism. The optimization approach is also interesting.

First, you can consider this question: Can all transactions in the “executing” state be executed in parallel?

The answer is no.

This is because there may be transactions in the “waiting for lock” state due to lock conflicts. If these transactions are assigned to different workers on the slave, it will result in inconsistency between the slave and the master.

On the other hand, the core of MariaDB’s strategy mentioned earlier is that “transactions in the commit state can be executed in parallel”. Transactions in the commit state indicates that they have passed the lock conflict check.

At this point, you can recall the two-phase commit mentioned in the previous [23rd article], and I will repost the detailed process diagram here.

Figure 7 Detailed process diagram of two-phase commit

In fact, as long as the transaction reaches the prepare phase, there is no need to wait for the commit phase. The prepare phase indicates that the transaction has already passed the lock conflict check.

Therefore, the idea behind MySQL 5.7’s parallel replication strategy is:

Transactions in the prepare state can be executed in parallel on the slave.
Transactions in the prepare state and transactions in the commit state can also be executed in parallel on the slave.

In the [23rd article], when discussing the group commit of the binlog, I mentioned two parameters:

The binlog_group_commit_sync_delay parameter represents the number of microseconds to wait before calling fsync.
The binlog_group_commit_sync_no_delay_count parameter represents the number of accumulated times to wait before calling fsync.

These two parameters are used to intentionally increase the time it takes from binlog write to fsync in order to reduce the number of disk writes for binlog. In the parallel replication strategy of MySQL 5.7, they can be used to create more “transactions in the prepare stage at the same time”. This increases the parallelism of replication in the slave.

In other words, these two parameters can intentionally slow down the commit in the master and make the execution in the slave faster. When dealing with slave latency in MySQL 5.7, you can consider adjusting the values of these two parameters to improve the concurrency of slave replication.

Parallel Replication Strategy in MySQL 5.7.22 #

In the MySQL 5.7.22 version released in April 2018, MySQL introduced a new parallel replication strategy based on WRITESET.

This introduces a new parameter, binlog-transaction-dependency-tracking, which controls whether to enable this new strategy. The parameter has three possible values:

COMMIT_ORDER, which represents the strategy described earlier, where parallelism is determined based on transactions entering the prepare and commit phases at the same time.
WRITESET, which represents the strategy where a hash value is calculated for each row involved in a transaction’s update, forming a writeset. If two transactions do not operate on the same rows, meaning their writesets have no intersection, they can execute in parallel.
WRITESET_SESSION, which adds an additional constraint to WRITESET, ensuring the same order of execution for two transactions executed by the same thread on the master when executed on the slave.

Of course, to provide uniqueness, the hash value is calculated using “database name + table name + index name + value”. If a table has unique indexes in addition to the primary key index, then for each unique index, the writeset corresponding to the insert statement needs to have an additional hash value.

As you may have noticed, this is similar to the row-based distribution strategy we described earlier in MySQL version 5.5. However, the official implementation of MySQL has significant advantages:

The writeset is written directly into the binlog after it is generated on the master, so there is no need to parse the contents of the binlog (the row data in the events) when executing on the slave, saving a lot of computational overhead.
It does not require scanning the entire binlog of a transaction to determine which worker it should be distributed to, saving memory.
Since the distribution strategy on the slave does not depend on the binlog contents, the binlog can be in statement format.

Therefore, the parallel replication strategy in MySQL 5.7.22 provides guarantees in terms of universality.

Of course, for scenarios where there is “no primary key on the table” or “foreign key constraints”, the WRITESET strategy cannot be parallelized and will temporarily degrade to a single-threaded model.

Summary #

In today’s article, I introduced various multi-threaded replication strategies in MySQL.

Why do we need multi-threaded replication? This is because the capabilities of single-threaded replication are generally inferior to multi-threaded replication. For a master with heavy update pressure, the slave may not be able to catch up with the changes from the master. This can be observed as an increasing value of seconds_behind_master on the slave.

After introducing each parallel replication strategy, I also shared the pros and cons of different strategies:

If you are a DBA, you need to choose different strategies according to different business scenarios.
If you are a business developer, I hope you can gain inspiration from these strategies and apply them to your regular development work.

From this analysis, you will also find that large transactions not only affect the master but are also one of the main reasons for replication delay on the slave. Therefore, in your regular development work, I recommend reducing large transaction operations and splitting them into smaller transactions.

The parallel replication strategy introduced in the official MySQL 5.7 version modifies the content of the binlog, which means that the binlog protocol is not upward compatible. This factor should also be taken into account during master-slave switching or version upgrades.

Finally, I leave you with a question to ponder.

Assume a master with MySQL 5.7.22 version has been inserting data in a single thread. After 3 hours, we want to set up a slave with the same version of MySQL as the master.

At this point, in order to make the slave catch up with the master faster, you want to enable parallel replication. Among the three possible values of the binlog-transaction-dependency-tracking parameter - COMMIT_ORDER, WRITESET, and WRITESET_SESSION - which one would you choose?

What is your reason for choosing this value? If you set the other two parameters, what phenomenon do you think will occur?

You can write your answers and analysis in the comments section, and I will discuss this question with you in the next article. Thank you for listening, and feel free to share this article with more friends to read.

Previous Question Discussion #

The previous question was: Under what circumstances does the slave’s replication delay appear as a 45-degree line? Many comments in the comment section focused on the key point: the synchronization of the slave is completely blocked during this period.

There are mainly two typical scenarios that cause this phenomenon:

One is large transactions (including large table DDL or a transaction that modifies many rows).
Another hidden situation is when the slave starts a long transaction, such as:

begin; select * from t limit 1;

And then do nothing.

In this case, even if the main database performs an operation to add a field to table t, which is relatively small, this DDL operation will block the slave during application. However, you won’t be able to see this phenomenon.

There were also comments mentioning whether it is caused by multiple threads on the master and a single thread on the slave, and the slave not keeping up with the update pace of the master. Today’s article just discussed parallel replication. So, as you now know, this situation will result in a replication delay, but it will not appear as the standard 45-degree line.