07 Row Locks How to Reduce Row Lock Impact on Performance #

In the previous article, I introduced global locks and table-level locks in MySQL. Today, let’s talk about row locks in MySQL.

Row locks in MySQL are implemented by each individual engine at the engine level. However, not all engines support row locks. For example, the MyISAM engine does not support row locks. Not supporting row locks means that concurrency control can only use table locks. For tables using this engine, only one update can be executed on the table at any given time, which affects the concurrency of the business. InnoDB, on the other hand, supports row locks, which is one of the important reasons why MyISAM was replaced by InnoDB.

Today, we will mainly discuss InnoDB’s row locks and how to improve business concurrency by reducing lock conflicts.

As the name suggests, row locks are locks on row records in a data table. This is easy to understand. For example, if transaction A updates a row and at the same time transaction B wants to update the same row, it must wait for transaction A’s operation to complete before it can perform the update.

Of course, there are some concepts and designs in the database that are not so obvious. If these concepts are not understood and used properly, it can easily lead to unexpected behavior in programs, such as two-phase locking.

Starting with Two-Phase Locking #

Let me give you an example. In the operation sequence below, what will happen when the update statement of transaction B is executed? Assuming that the field “id” is the primary key of table “t”.

The conclusion of this problem depends on which locks transaction A holds after executing the two update statements and when they are released. You can verify this: in fact, the update statement of transaction B will be blocked until transaction A executes the commit, and then transaction B can continue to execute.

Knowing this answer, you must know that the row locks held by transaction A for the two records are only released when the commit is executed.

In other words, in InnoDB transactions, row locks are acquired only when needed and are not immediately released when no longer needed. Instead, they are released when the transaction ends. This is the two-phase locking protocol.

Knowing this, what does it help us with when using transactions? That is, if your transaction needs to lock multiple rows, you should put the locks that are most likely to cause conflicts and affect concurrency as late as possible. Let me give you an example.

Suppose you are responsible for implementing an online movie ticket transaction service. Customer A wants to buy a movie ticket from Theater B. Let’s simplify it a bit, and this business needs to involve the following operations:

Deduct the ticket price from Customer A’s account balance.
Increase the account balance of Theater B by the ticket price.
Record a transaction log.

In other words, to complete this transaction, we need to update two records and insert one record. Of course, to ensure the atomicity of the transaction, we need to put these three operations in one transaction. So, how would you arrange the order of these three statements in the transaction?

Imagine if another customer, Customer C, wants to buy a ticket at Theater B at the same time. The part of these two transactions that conflict is statement 2. Because they need to update the balance of the same theater account and modify the same row of data. According to the two-phase locking protocol, regardless of how you arrange the order of statements, all the row locks required for the operations are only released when the transaction is committed. Therefore, if you arrange statement 2 last, for example, in the order of 3, 1, 2, the lock time for the cinema account balance row is minimized. This maximally reduces lock contention between transactions and improves concurrency.

Alright, now, thanks to your proper design, the row lock on the cinema balance row does not stay for a long time within a transaction. However, this does not completely solve your problem.

If the cinema holds a promotion where it sells discounted tickets for all movies within a year, and the promotion only lasts for one day, and during that time, your MySQL server crashes. When you check the server, you find that CPU consumption is close to 100%, but the entire database is executing fewer than 100 transactions per second. What could be the reason for this?

Here, I will talk about deadlock and deadlock detection.

Deadlock and Deadlock Detection #

When different threads in a concurrent system have circular dependencies on resources and each thread is waiting for another thread to release a resource, it leads to a situation where these threads enter an infinite waiting state, known as a deadlock. Let me give you an example using row locks in a database.

At this point, transaction A is waiting for transaction B to release the row lock on id=2, while transaction B is waiting for transaction A to release the row lock on id=1. Transaction A and transaction B are waiting for each other’s resources to be released, resulting in a deadlock. When a deadlock occurs, there are two strategies:

One strategy is to enter a wait state until a timeout occurs. This timeout period can be set using the innodb_lock_wait_timeout parameter.
Another strategy is to initiate deadlock detection. After detecting a deadlock, one of the transactions in the deadlock chain is actively rolled back, allowing other transactions to continue executing. Setting the innodb_deadlock_detect parameter to “on” enables this logic.

In InnoDB, the default value of innodb_lock_wait_timeout is 50 seconds, which means that if the first strategy is used, when a deadlock occurs, the first thread locked must wait for 50 seconds to time out and then other threads have a chance to continue executing. For online services, this wait time is often not acceptable.

However, it is not possible to directly set this timeout value to a very small value, such as 1 second. In such a case, if it is not a deadlock but a simple lock wait, false positives will occur. Thus, if the timeout value is set too short, there will be many false positives.

Therefore, under normal circumstances, the second strategy should be used, which is active deadlock detection, and the default value of innodb_deadlock_detect is already “on”. Active deadlock detection can quickly detect and handle deadlocks when they occur, but it also has an additional burden.

You can imagine the process: whenever a transaction is locked, it needs to check if the threads it depends on have been locked by others, and this process continues to loop until it determines whether there is a circular wait, i.e., a deadlock.

But what if it is a scenario where all transactions mentioned above need to update the same row?

Each newly blocked thread needs to determine whether its own joining will cause a deadlock, which is an operation with a time complexity of O(n). Suppose there are 1000 concurrent threads that need to update the same row at the same time, then the deadlock detection operation will be on the order of 1 million. Although the final result of the detection is no deadlock, a large amount of CPU resources are consumed during this period. Therefore, you will see a high CPU utilization, but only a few transactions are executed per second.

Based on the analysis above, let’s discuss how we can solve the performance problem caused by hot row updates. The crux of the problem lies in the fact that deadlock detection consumes a lot of CPU resources.

One way to alleviate this problem is to temporarily disable deadlock detection if you can ensure that deadlocks will not occur in this business scenario. However, this operation itself carries certain risks, because deadlock is generally not considered a serious error when designing the business. After all, if a deadlock occurs, it can be rolled back, and then the business can be retried, which is business lossless. Disabling deadlock detection means that there may be a large number of timeouts, which incurs business losses. Another approach is to control the degree of concurrency. Based on the analysis above, you will find that if the concurrency can be controlled, for example, only 10 threads are updating the same row at the same time, then the cost of deadlock detection is low, and this problem will not occur. One direct idea is to do concurrency control on the client side. However, you will quickly find that this method is not feasible because there are many clients. I have seen an application with 600 clients, so even if each client controls to only 5 concurrent threads, the peak concurrency may still reach 3000 after being aggregated to the database server.

Therefore, this concurrency control needs to be done on the database server. If you have middleware, you can consider implementing it in the middleware; if your team has people who can modify the MySQL source code, it can also be done inside MySQL. The basic idea is to queue the updates for the same row before entering the engine. In this way, there will not be a large amount of deadlock detection work inside InnoDB.

You may ask, if there are no database experts in the team temporarily and cannot implement such a solution, can this problem be optimized from the design perspective?

You can consider reducing lock conflicts by changing one row to multiple logically related rows. Taking the cinema account as an example, you can consider dividing it into multiple records, such as 10 records, and the total amount of the cinema’s account is equal to the sum of the values of these 10 records. Every time you need to add money to the cinema account, you randomly select one of these records to add money. This way, the probability of conflicts is reduced to 1/10 of the original, which reduces the number of lock waits and also reduces the CPU consumption of deadlock detection.

This solution seems to be lossless, but in fact, such solutions require detailed design based on business logic. If the account balance may decrease, such as the ticket refund logic, then at this time, special handling is required when some rows become 0.

Summary #

Today, I introduced the row-level lock in MySQL, involving two major aspects: two-phase locking protocol, deadlock, and deadlock detection.

Among them, starting with the two-phase protocol, I discussed with you how to arrange the correct transaction statements during development. The principle / advice I gave you here is: if your transaction needs to lock multiple rows, try to postpone the timing of applying for locks that are most likely to cause lock conflicts and most likely to affect concurrency.

However, adjusting the order of statements cannot completely avoid deadlocks. So we introduced the concept of deadlock and deadlock detection, and provided three solutions to reduce the impact of deadlocks on the database. The main direction to reduce deadlocks is to control the concurrency of transactions accessing the same resource.

Finally, let me leave you with a question. If you want to delete the first 10,000 rows of a table, there are three methods to achieve this:

The first method is to directly execute delete from T limit 10000;.
The second method is to execute delete from T limit 500; in a loop 20 times in one connection.
The third method is to execute delete from T limit 500; simultaneously in 20 connections.

Which method would you choose? Why?

You can write your thoughts and opinions in the comments section, and I will discuss this question with you at the end of the next article. Thank you for listening, and feel free to share this article with more friends to read together.

Previous Question #

The question I left you in the previous article was: What happens when a DDL statement comes from the binlog of the master database when using --single-transaction for logical backup on the standby database?

Assume that the DDL is for table t1, and here are the key statements during the backup process:

Q1: SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Q2: START TRANSACTION WITH CONSISTENT SNAPSHOT;
/* other tables */
Q3: SAVEPOINT sp;
/* Situation 1 */
Q4: SHOW CREATE TABLE `t1`;
/* Situation 2 */
Q5: SELECT * FROM `t1`;
/* Situation 3 */
Q6: ROLLBACK TO SAVEPOINT sp;
/* Situation 4 */
/* other tables */

At the beginning of the backup, to ensure the RR (repeatable read) isolation level, the RR isolation level is set again (Q1);

Start a transaction, here using WITH CONSISTENT SNAPSHOT to ensure that after this statement is executed, a consistent view can be obtained (Q2);

Set a savepoint, which is very important (Q3);

SHOW CREATE is to get the table structure (Q4), and then formally import data (Q5), roll back to the savepoint sp, which is used to release the MDL lock of t1 (Q6). Of course, this part is “beyond the scope”, and it was not mentioned in the main text above.

Depending on the time when the DDL arrives from the master database, I have defined four situations. Assuming it arrives, if it starts to execute, it can be completed quickly.

The answers are as follows:

If it arrives before the execution of the Q4 statement, the phenomenon: no effect, the backup gets the table structure after the DDL.
If it arrives between “Situation 2” and “Situation 3”, the table structure has been changed, and when Q5 is executed, it will report “Table definition has changed, please retry transaction”, the phenomenon: mysqldump terminates.
If it arrives between “Situation 2” and “Situation 3”, mysqldump holds the MDL read lock of t1, and the binlog is blocked, the phenomenon: master-slave delay exists until Q6 is executed.
Starting from “Situation 4”, mysqldump releases the MDL read lock, the phenomenon: no effect, the backup gets the table structure before the DDL.