21 Why Does a Sentence Changing Just One Line Lock So Much #

In the previous article, I introduced the concepts of gap lock and next-key lock, but did not explain the locking rules. It can be difficult to understand the concept of gap lock, especially when combined with row lock, it is easy to make mistakes in determining whether there will be lock waits.

So today, let’s start with these locking rules.

First of all, let me explain that I haven’t seen similar summaries of these locking rules elsewhere. In the past, when I made judgments, I was always thinking about the implementation in the code. This time, in order to summarize the rules in a way that students who don’t look at the code can understand, I have summarized them again after reviewing the code temporarily. Therefore, these rules have the following two premises:

The locking strategy may change in future versions of MySQL, so these rules only apply to the latest versions so far, specifically 5.x series <= 5.7.24, 8.0 series <= 8.0.13.
If anyone finds a bad case during verification, please point it out and I will add it to this article so that all students learning this column together can benefit.

Because gap lock is only effective under the REPEATABLE READ isolation level, for the descriptions in the following of this article, unless otherwise specified, the default is REPEATABLE READ isolation level.

The locking rules I have summarized include two principles, two optimizations, and one bug.

Principle 1: The basic unit of locking is the next-key lock. I hope you remember that the next-key lock is a half-open interval.
Principle 2: Only objects accessed during the lookup process are locked.
Optimization 1: For equality queries on indexes, when locking a unique index, the next-key lock degrades to a row lock.
Optimization 2: For equality queries on indexes, when traversing to the right and the last value does not satisfy the equality condition, the next-key lock degrades to a gap lock.
A bug: Range queries on unique indexes will access until the first value that does not satisfy the condition.

Let me take the table “t” from the previous article as an example and explain these rules to you. The table “t” is created and initialized with the following statements.

CREATE TABLE `t` (
  `id` int(11) NOT NULL,
  `c` int(11) DEFAULT NULL,
  `d` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `c` (`c`)
) ENGINE=InnoDB;

insert into t values(0,0,0),(5,5,5),
(10,10,10),(15,15,15),(20,20,20),(25,25,25);

The next examples are mainly explained with the help of pictures, so I suggest you refer to the text while looking at the pictures. Some examples may be “mind-blowing,” so I recommend that you try it out yourself after reading the article.

Example 1: Gap Lock in Equality Queries #

The first example is about the gap between equality operations:

Figure 1: Gap lock in an equality query

Since there is no record with id=7 in table “t”, if we apply the locking rules mentioned above:

According to Principle 1, the locking unit is the next-key lock, and the locking range of session A is (5,10].
At the same time, according to Optimization 2, this is an equality query (id=7), but id=10 does not satisfy the query condition, so the next-key lock degrades to a gap lock. Therefore, the final locking range is (5,10).

Therefore, session B will be locked when trying to insert a record with id=8 into this gap, but session C can modify the row with id=10.

Example 2: Locks on Non-Unique Index Equality Queries #

The second example is about locks on covered indexes:

Figure 2: Locks added on non-unique indexes #

After seeing this example, do you feel like “locking the unlocked and mislocking the unlocked”? Let’s analyze it.

In this example, session A wants to add a read lock to the row where c=5 on the index c.

According to Principle 1, the locking unit is the next-key lock, so a next-key lock is added to (0,5].
Note that c is a regular index, so it is not enough to just access the single record c=5, you need to traverse to the right and find c=10 before giving up. According to Principle 2, all accessed records need to be locked, so a next-key lock is added to (5,10].
However, at the same time, this satisfies Optimization 2: equality judgment, traverse to the right, and the last value does not satisfy c=5, so it degenerates into a gap lock (5,10).
According to Principle 2, only the accessed objects are locked, this query uses a covering index and does not need to access the primary key index, so no lock is added on the primary key index, which is why session B’s update statement can be executed.

However, when session C wants to insert a record (7,7,7), it will be locked by session A’s gap lock (5,10).

It is worth noting that in this example, LOCK IN SHARE MODE only locks the covering index, but it is different if it is FOR UPDATE. When FOR UPDATE is executed, the system thinks that you will update the data next, so it will also add row locks to the rows that meet the conditions on the primary key index.

This example shows that locks are added on indexes; at the same time, it gives us guidance that if you want to use LOCK IN SHARE MODE to add read locks to rows and prevent data from being updated, you must bypass the optimization on the covering index and add a field that does not exist in the index to the query fields. For example, change session A’s query statement to SELECT d FROM t WHERE c=5 LOCK IN SHARE MODE. You can verify the effect for yourself.

Case 3: Range locks on primary key indexes #

The third example is about range queries.

Before giving an example, please think about this question: For our table t, are the locking ranges the same for the following two query statements?

mysql> SELECT * FROM t WHERE id=10 FOR UPDATE;
mysql> SELECT * FROM t WHERE id>=10 AND id<11 FOR UPDATE;

You might think that id is defined as an int type, so these two statements are equivalent, right? Actually, they are not exactly the same.

Logically, these two query statements must be equivalent, but their locking rules are slightly different. Now, let session A execute the second query statement and see the locking effect.

Figure 3: Range locks on primary key indexes

Now, let’s use the previously mentioned locking rules to analyze what locks will be added by session A.

At the beginning of execution, it needs to find the row where id=10, so the expected lock should be a next-key lock (5,10]. According to Optimization 1, due to the equality condition on the primary key id, it degenerates into a row lock and only adds a row lock to the row with id=10.
The range search continues to find id=15, so it needs to add a next-key lock (10,15].

Therefore, at this time, session A has locked the primary key index, with a row lock on id=10 and next-key lock (10,15]. This way, you can understand the results of session B and session C.

Here you need to note that when session A initially locates the row with id=10, it is treated as an equality query, while when scanning to id=15, it is treated as a range query.

Case 4: Range locks on non-unique indexes #

Next, let’s look at two examples of range query locking, and you can compare them with Case 3.

One thing to note is that, unlike Case 3, the WHERE section in these query statements uses the field c.

Figure 4: Range locks on non-unique indexes

This time, session A uses the field c for judgment. The difference between the locking rules and the uniqueness of Case 3 is: when locating the record with c=10 for the first time, after adding the (5,10] next-key lock on the index c, since the index c is a non-unique index, there is no optimization rule. In other words, it will not degenerate into a row lock, so ultimately session A adds the (5,10] and (10,15] next-key locks on the index c.

Therefore, from the result perspective, when session B tries to insert the (8,8,8) record, it will be blocked. It is reasonable to stop scanning when c=15 is reached, because InnoDB needs to scan until c=15 to know that there is no need to continue looking.

Case 5: Bug in Unique Index Range Locks #

In the previous four cases, we have already used two principles and two optimizations in the locking rules. Now let’s take a look at a bug related to the locking rules.

Figure 5: Bug in Unique Index Range Locks

Session A performs a range query. According to Principle 1, a next-key lock should be placed on the range (10,15] on the index id. Since id is a unique key, the loop should stop at row id=15.

But in the implementation, InnoDB scans backwards until the first row that does not satisfy the condition, which is id=20. Furthermore, because this is a range scan, the next-key lock (15,20] on index id will also be locked.

As a result, you can see that session B, which wants to update row id=20, will be locked. Similarly, session C, which wants to insert a row with id=16, will also be locked.

In theory, it is unnecessary to lock row id=20. Because once id=15 is reached, there is no need to continue searching. But in practice, it still happens, and I consider it to be a bug.

I have discussed this with experts in the community, and it has been mentioned in the official bug tracking system, but it has not been verified. So, considering this as a bug is just my opinion. If you have other insights, you are welcome to share them.

Case 6: Example of “Equals” on Non-Unique Index #

The next example is to better explain the concept of “gap”. Here, I will insert a new record into table t.

mysql> insert into t values(30,10,30);

The newly inserted row has c=10, which means there are now two rows with c=10 in the table. So, what is the state of the gap on index c now? You should know that it is not possible to have two “identical” rows on a non-unique index that includes the value of the primary key.

Figure 6: Example of “Equals” on Non-Unique Index

As you can see, although there are two c=10 values, their primary key values id are different (10 and 30), so there is a gap between these two c=10 records.

In the diagram, I have indicated the primary key id on index c. To differentiate it from the open-ended form of the gap lock, I used the format (c=10, id=30) to represent a row on the index.

Now, let’s take a look at Case 6.

This time, we will use a delete statement to demonstrate. Note that the locking logic of the delete statement is similar to select … for update, which is the same as the two principles, two optimizations, and one bug I summarized at the beginning of the article.

Figure 7: Delete example

When session A is traversing, it first accesses the first record with c=10. Similarly, according to Principle 1, a next-key lock is placed from (c=5, id=5) to (c=10, id=10).

Then, session A looks to the right until it encounters the row (c=15, id=15), and the loop ends. According to Optimization 2, this is an equality query, and since it found a row that does not satisfy the condition to the right, it degrades into a gap lock from (c=10, id=10) to (c=15, id=15).

In other words, the locking range on index c for this delete statement is the blue area in the figure below.

Figure 8: Locking effect of delete example

Both ends of this blue area are represented by dashed lines, indicating an open interval, meaning there is no lock on the rows (c=5, id=5) and (c=15, id=15).

Case 7: Locking with Limit Statement #

Case 6 also has a contrasting case, as shown in the following scenario:

Figure 9: Locking with Limit statement In this example, the delete statement in session A has added ’limit 2’. You know that there are only two records in table t where c=10. Therefore, whether the limit 2 is added or not, the deletion effect is the same, but the locking effect is different. As you can see, the insert statement in session B has been executed, which is different from the result in Case 6.

This is because in Case 7, the delete statement explicitly added a limit of 2, so after traversing to the row (c=10, id=30), there are already two statements that satisfy the condition, so the loop is terminated.

As a result, the locking scope on index c becomes the half-open interval from (c=5, id=5) to (c=10, id=30), as shown in the following figure:

Figure 10: Locking effect with limit 2

As you can see, the gap after (c=10, id=30) is not within the locking scope, so the insert statement with c=12 can be executed successfully.

The guidance from this example for our practice is to add limit when deleting data. This not only controls the number of data being deleted, making the operation safer, but also reduces the locking scope.

Case 8: An Example of Deadlock #

In the previous examples, we analyzed according to the logic of next-key locks, because it is more convenient for analysis. Finally, let’s take a look at another case to illustrate that next-key locks are actually the result of combining gap locks and record locks.

You may wonder, didn’t we mention this concept at the beginning? Don’t worry, let’s take a look at the following example:

Figure 11: Operation sequence of Case 8

Now, let’s analyze why the result is as shown above in chronological order.

Session A starts a transaction and executes the query statement with ’lock in share mode’, adding a next-key lock [5,10) and a gap lock (10,15) on index c.
Session B’s update statement also needs to add a next-key lock [5,10) on index c, leading to lock wait.
Then, Session A tries to insert the row (8,8,8), but is locked by Session B’s gap lock. Due to a deadlock, InnoDB rolls back Session B.

You may ask, didn’t the next-key lock of Session B fail to acquire?

Actually, it is like this: the “add next-key lock [5,10)” operation of Session B is actually divided into two steps. First, the gap lock (5,10) is acquired, and it is successful. Then, the record lock on c=10 is acquired, and it is locked at this point.

In other words, when analyzing the locking rules, we can use next-key locks. But we must know that during the actual execution, it is divided into two parts: gap locks and record locks.

Summary #

I would like to reiterate that all the examples above were verified under the repeatable-read isolation level. At the same time, the repeatable-read isolation level follows the two-phase locking protocol, and all locked resources are released only when the transaction is committed or rolled back.

In the final example, you can clearly see that next-key locks are actually implemented by combining gap locks and record locks. If you switch to the read-committed isolation level, it will be easier to understand. In that case, you can remove the part of the gap locks during the process, leaving only the record locks.

In fact, there are still gap locks in the foreign key scenario under the read-committed isolation level, which is relatively more complicated, but we won’t go into detail today.

In addition, there is an optimization under the read-committed isolation level: the row locks added during the execution of a statement will be directly released on rows that do not meet the conditions after the statement completes, without waiting for the transaction to commit.

This means that under the read-committed isolation level, the locking scope is smaller and the locking time is shorter, which is why many businesses default to using the read-committed isolation level.

However, I hope that after learning today’s lesson, you will have a clearer understanding of the concept of next-key locks and be able to use the locking rules to determine the locking scope of statements.

When the business needs to use the repeatable-read isolation level, you can design the database operations more carefully, solve the phantom read problem, and maximize the parallel processing capability of the system.

After reading this article, please take another look at the discussion question at the end of the previous article and try to analyze it again.

I have rephrased and simplified the question: considering the table t we initialized at the beginning of the article with 6 rows, why is the insert operation in session B locked in the statement sequence shown in Figure 12?

Figure 12: Lock analysis discussion question

Also, if you are interested, you can design a statement sequence and analyze it before execution, and then verify whether the results match your analysis.

For results that you cannot explain, you can post them in the comments. I will try to select some interesting cases to analyze and share with everyone in the next article. Thank you for following along, and feel free to share this article with more friends to read together.

Previous Issue Time #

The question from the previous issue will continue to be used as a post-school thinking question in this issue, so the “answer” will be announced together in the next article.

Here, I will expand on answering the questions from several classmates in the comments section.

@令狐少侠 said that he used to think that gap locks only exist on secondary indexes. Now you know that gap locks can exist wherever there are gaps.
Classmate @浪里白条 asked what the locking rules are for varchar types. Answer: Actually, when judging gaps, varchar and int are the same. After sorting, there are gaps between adjacent values.
Several classmates mentioned that the results they verified in the previous article were different from Case 1. After session A executes these two statements:

begin;
select * from t where d=5 for update; /*Q1*/

The update in session B and the insert in session C will be blocked. Is this contradictory to the conclusion in the article?

Actually, it is not. This example uses a proof by contradiction, assuming that there is no blocking, which would cause problems. Then, it is deduced that session A needs to lock all rows and gaps in the entire table.