30 Q&A Article - Dynamic Views on Locking #

In the 20th and 21st articles, I introduced you to InnoDB’s gap locks, next-key locks, and locking rules. There were many high-quality comments in the comment section of these two articles. I believe that by analyzing these questions, we can help deepen your understanding of locking rules.

Therefore, I have selected several representative questions from these comments for today’s Q&A article, which is about viewing locks from a dynamic perspective.

To help you understand, let’s review the locking rules together. This rule contains two “principles”, two “optimizations”, and one “bug”:

Principle 1: The basic unit of locking is the next-key lock. Hopefully, you still remember that a next-key lock is an open interval on the left and a closed interval on the right.
Principle 2: Only the objects accessed during the search process will be locked.
Optimization 1: For an equality query on an index, if a unique index is locked, the next-key lock will be downgraded to a row lock.
Optimization 2: For an equality query on an index, if a traversal to the right does not satisfy the equality condition for the last value, the next-key lock will be downgraded to a gap lock.
One bug: A range query on a unique index will access the first value that does not meet the condition.

Next, our discussion will be based on the following table t:

CREATE TABLE `t` (
  `id` int(11) NOT NULL,
  `c` int(11) DEFAULT NULL,
  `d` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `c` (`c`)
) ENGINE=InnoDB;

INSERT INTO t VALUES (0,0,0),(5,5,5),
(10,10,10),(15,15,15),(20,20,20),(25,25,25);

Equality Query within Inequality Conditions #

Some students have expressed doubts about “equality queries”: What is the difference between an equality query and a “traversal”? Why do the examples in our articles include equality queries despite having inequality conditions?

Let’s take a look at this example together and analyze the locking range of this query:

BEGIN;
SELECT * FROM t WHERE id>9 AND id<12 ORDER BY id DESC FOR UPDATE;

Using the aforementioned locking rules, we know that the locking range of this statement is (0,5], (5,10], and (10,15) on the primary key index. In other words, the row with id=15 is not locked. Why is that?

We said that the unit of locking is the next-key lock, and it is an open interval on the left and a closed interval on the right. However, here we are using Optimization 2, which means that during the equality query on the index, when traversing to the right and the last value does not satisfy the equality condition, the next-key lock is downgraded to a gap lock (10,15).

But then, why do we have an “equality query” when the where condition uses greater than and less than operators?

You need to know that lock actions occur during the execution of the statement. Therefore, when analyzing the locking behavior, you need to start from the data structure of the index. Here, let me break down this process for you.

As shown in Figure 1, this is a schematic representation of the index id for this table.

Figure 1: Schematic representation of the index id

First, the semantics of this query are order by id desc, so in order to retrieve all rows that satisfy the condition, the optimizer must first find the “first value of id<12”.
This process is achieved through the search process of the index tree. Internally, the engine needs to find the value id=12, but it isn’t found. Instead, it finds the gap (10,15).
Then, during the leftward traversal, it is no longer an equality query. It scans the row with id=5, so it adds a next-key lock (0,5].

In other words, during the execution process, an “equality query” method is used when locating the records through tree search.

Process of Equivalent Query #

Corresponding to the example above, there is a question raised by @发条橙子同学: what is the locking range of the following statement?

begin;
select id from t where c in(5,20,10) lock in share mode;

The query statement above uses in, let’s first look at the explain result of this statement.

Figure 2: Explain result of the in statement

As we can see, this in statement uses the index c and rows=3, indicating that these three values are located through B+ tree search.

When searching for c=5, (0,5] is locked. However, because c is not a unique index, in order to confirm if there are any other records with c=5, it needs to traverse to the right and find c=10 to confirm that there are no more records. This process satisfies optimization 2, so an interval lock (5,10) is added.

Similarly, when executing c=10, the locking range is (5,10] and (10,15); When executing c=20, the locking range is (15,20] and (20,25).

Through this analysis, we can know that the order of the three record locks added to index c in this statement is: first add the record lock of c=5, then add the record lock of c=10, and finally add the record lock of c=20.

You may say that this locking range is just removing the row lock of c=15 from (5,25). Why bother to explain it in segments?

Because I want to emphasize this process to you: these locks are “added one by one during the execution process”, rather than added all at once.

After understanding this locking process, we can now analyze the deadlock problem in the following example.

If there is another statement written in the following way at the same time:

select id from t where c in(5,20,10) order by c desc for update;

What is the locking range at this time?

We now know that gap locks do not interlock, but both of these statements will add record locks to the three rows c=5, 10, and 20 on index c.

Here, you need to pay attention that since the statement has order by c desc, the sequence of adding these three record locks is to first lock c=20, then c=10, and finally c=5.

In other words, these two statements need to lock the same resources, but the locking order is reversed. When these two statements are executed concurrently, deadlock may occur.

Regarding information about deadlock, MySQL only keeps the information of the most recent deadlock, but this information is incomplete.

Some classmates left comments in the comment section, hoping that I can explain how to observe deadlocks in more detail. Now, let’s briefly analyze the deadlock situation in the above example.

How to Observe Deadlocks? #

Figure 3 shows a partial output obtained by executing the show engine innodb status command after a deadlock occurs. This command outputs a lot of information, and one section is LATEST DETECTED DEADLOCK, which records the information of the last deadlock.

Figure 3: Deadlock situation

Let’s take a look at several key pieces of information in this figure.

This result is divided into three parts:

(1) TRANSACTION, which is the information of the first transaction;
(2) TRANSACTION, which is the information of the second transaction;
WE ROLL BACK TRANSACTION (1), which is the final processing result, indicating that the first transaction was rolled back.

Information of the first transaction:

WAITING FOR THIS LOCK TO BE GRANTED, indicates the lock information that this transaction is waiting for;
index c of table test.t, indicates that it is waiting for the lock on index c of table t;
lock mode S waiting indicates that this statement wants to add a read lock by itself, and the current state is waiting;
Record lock indicates that this is a record lock;
n_fields 2 indicates that this record has two columns, namely the field c and the primary key field id;
0: len 4; hex 0000000a; asc ;; is the first field, which is c. The value is hexadecimal a, which is 10.
1: len 4; hex 0000000a; asc ;; It is the second field, which is also the primary key id, with a value of 10;
The “asc” in these two lines means that the following “printable characters” in the values will be printed, but 10 is not a printable character, so it is displayed as space.
The first transaction information only shows the locked state, waiting for the lock on (c=10,id=10).
Of course, as you know, since a deadlock has occurred, this transaction also holds other locks, but they are not displayed. Don’t worry, we can infer this from the information of the second transaction.

The second transaction displays more information:

" HOLDS THE LOCK(S)" is used to show which locks this transaction holds;
index c of table test.t indicates that the lock is on the index c of table t;
hex 0000000a and hex 00000014 indicate that this transaction holds the record locks on c=10 and c=20;
WAITING FOR THIS LOCK TO BE GRANTED indicates that it is waiting for the record lock on (c=5,id=5).

From the above information, we know that:

The “lock in share mode” statement holds the record lock on c=5 and is waiting for the lock on c=10;
The “for update” statement holds the record locks on c=20 and c=10, and is waiting for the record lock on c=5.

Therefore, a deadlock occurs. Here, we can draw two conclusions:

Since locks are acquired one by one, to avoid deadlocks, for the same set of resources, they should be accessed in as similar order as possible;
At the moment when a deadlock occurs, the “for update” statement holds more resources, and the cost of rollback is higher. Therefore, InnoDB chooses the “lock in share mode” statement with lower rollback cost to rollback.

How to view lock waits? #

After understanding deadlocks, let’s look at an example of lock waits.

In the comments section of Article 21, @Geek_9ca34e conducted an interesting verification. I have listed the reproduction steps:

Figure 4: Changes in gaps caused by delete

As you can see, since session A did not lock the record c=10, session B is able to delete the row with id=10. However, afterwards, session B cannot insert id=10 back.

Now let’s take a look at the result of show engine innodb status at this time and see if it can provide us with some clues. The lock information is in the TRANSACTIONS section of the output of this command. You can see this picture in the document

Figure 5: Lock wait information

Let’s look at a few key pieces of information.

index PRIMARY of table test.t indicates that the statement is locked because of a lock on a primary key of table t.
lock_mode X locks gap before rec insert intention waiting provides several pieces of information:

insert intention indicates that the current thread is preparing to insert a record, which is an insert intention lock. For the sake of understanding, you can think of it as the insertion action itself.
gap before rec indicates that this is a gap lock, not a record lock.

So which record is the gap before? The content of the next 0~4 lines represents the information of this record.
n_fields 5 also indicates that this record has 5 columns:

0: len 4; hex 0000000f; asc ;; The first column is the primary key id field, and the hexadecimal value f represents id=15. Therefore, we know that this gap is before id=15, because id=10 no longer exists, and it represents (5,15).
1: len 6; hex 000000000513; asc ;; The second column is the transaction id with a length of 6 bytes, indicating that the transaction with trx id 1299 last modified this row.
2: len 7; hex b0000001250134; asc % 4;; The third column is the rollback segment information with a length of 7 bytes. As you can see, there is displayed content after “acs” (% and 4), because this byte is a printable character.
The last two columns are the values of c and d, both of which are 15.

Therefore, we know that due to the delete operation, the row with id=10 is removed, and the two gaps (5,10) and (10,15) become one gap (5,15).

Speaking of this, you can combine and think about the relationship between these two phenomena:

After session A executed the select statement, it didn’t do anything, but the range it locked suddenly “expanded”;
In the post-thinking question of Article 21, when we execute select * from t where c>=15 and c<=20 order by c desc lock in share mode; and scan to c=10, we need to lock (5,10]. In other words, the so-called “gap” is defined by the “record to the right of this gap”.

Example of update #

After seeing the locking examples for insert and delete, let’s take a look at an example of an update statement. @信信 in the comment section conducted this experiment:

Figure 6: Example of update

You can analyze it yourself. The locking range for session A is (5,10], (10,15], (15,20], (20,25], and (25,supremum] on index c.

Note: The first record found by querying c>5 is c=10, so the next-key lock (0,5] is not acquired.

After that, in the first update statement of session B, c=5 is changed to c=1. We can understand it as two steps:

Insert the record (c=1, id=5).
Delete the record (c=5, id=5).

As we mentioned in the previous section, the gap (5,10) on index c is defined by the record to the right of this gap, which is c=10. Therefore, after this operation, the locking range of session A becomes as shown in Figure 7:

Figure 7: Session B modification and session A locking range

Okay, let’s move on to the update statement of session B: update t set c = 5 where c = 1. Similarly, it can be divided into two steps:

Insert the record (c=5, id=5).
Delete the record (c=1, id=5).

In the first step, session B tries to insert data into the already locked range (1,10), so it is blocked.

Summary #

In today’s article, I reviewed the locking rules again with the questions from the comments of the previous [20th] and [21st] articles. I also emphasized that when analyzing the locking range, it must be done in conjunction with the logical execution of the statements.

In my opinion, every student who wants to have a thorough understanding of the principles of MySQL should be able to imagine the execution process of an SQL statement based on the result of the explain command. Only by reaching this level can one have a clearer understanding of the concepts of index-organized tables, indexes, and locks. You can use the same method to test your understanding of these concepts.

During the analysis of these locking rules, I also briefly introduced how to interpret the transaction and deadlock information in the output of the show engine innodb status command. I hope these contents will be helpful to you in analyzing incidents in the future.

As always, even though this is a Q&A article, I still have a question for you.

We mentioned an important point earlier: the so-called “gap” is actually defined by the record to the right of this gap.

So, does an empty table have a gap? Who defines this gap? How can you verify this conclusion?

Please write your analysis and verification methods in the comment section, and I will discuss this question with you at the end of the next article. Thank you for listening, and feel free to share this article with more friends to read together.

Previous question #

In the previous article, I asked you to share your experiences in handling business monitoring.

In the comment section of this article, many students shared their good experiences. Here, I will select a few typical comments to share with you:

@老杨同志 provided a detailed answer. His main idea is to monitor service status and service quality. Monitoring service status can generally be implemented using external systems, while monitoring service quality requires tracking the response time of interfaces.
@Ryoma mentioned using healthCheck in services, which is similar to the pattern of select 1 mentioned in our article.
@强哥 classified monitoring into basic monitoring, service monitoring, and business monitoring based on the objects being monitored, and shared the objects that need to be focused on for each type of monitoring.