20 What is Phantom Read and What are the Issues #

In the previous article, I left you with a question about locking rules. Let’s start with that question today.

To illustrate the problem, in this article, we will use a smaller table. The table and the initialization statement are as follows (I made some modifications to the table structure used in the previous article for the purpose of this example):

CREATE TABLE `t` (
  `id` int(11) NOT NULL,
  `c` int(11) DEFAULT NULL,
  `d` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `c` (`c`)
) ENGINE=InnoDB;

insert into t values(0,0,0),(5,5,5),
(10,10,10),(15,15,15),(20,20,20),(25,25,25);

This table has a primary key id and an index c. The initialization statement inserts 6 rows of data into the table.

In the previous article, I asked you how the following sequence of statements acquires locks and when it releases them:

begin;
select * from t where d=5 for update;
commit;

It is easy to understand that this statement will match the row with d=5, corresponding to the primary key id=5. Therefore, after the execution of the select statement, a write lock will be acquired on the row with id=5. Furthermore, due to the two-phase locking protocol, this write lock will be released when the commit statement is executed.

Since there is no index on the field d, this query statement will perform a full table scan. So, will the other 5 rows that are scanned but do not satisfy the condition be locked?

We know that InnoDB’s default transaction isolation level is repeatable read, so for the rest of this article, unless otherwise stated, we will assume that repeatable read isolation level is set.

What is a phantom read? #

Now, let’s analyze what happens if only the row with id=5 is locked and the other rows are not.

Let’s take a look at the following scenario (note: this is a hypothetical scenario I created):

Figure 1: Hypothetical scenario where only the row with id=5 is locked

As you can see, session A executes three queries: Q1, Q2, and Q3. The SQL statements of these queries are the same: select * from t where d=5 for update. You should understand the meaning of this statement; it retrieves all rows with d=5, using the current read and acquiring a write lock. Now let’s see what each of these three SQL statements returns.

Q1 returns only the row with id=5.
At T2, session B changes the value of d for the row with id=0 to 5. Therefore, at T3, Q2 will retrieve both the rows with id=0 and id=5.
At T4, session C inserts a new row (1,1,5). Therefore, at T5, Q3 will retrieve the rows with id=0, id=1, and id=5.

Among these, the phenomenon of Q3 retrieving the row with id=1 is called “phantom read”. In other words, a phantom read refers to a transaction seeing rows that were not seen in a previous query of the same range.

Here, I need to explain the concept of “phantom read”:

In the repeatable read isolation level, normal queries are snapshot reads, and they cannot see data inserted by other transactions. Therefore, phantom reads only occur under “current reads”.
The modification made by session B is seen by the select statement of session A using “current read”, so it cannot be called a phantom read. Phantom reads only refer to “newly inserted rows”.

If we analyze the results of these three SQL statements based on the transaction visibility rules we learned in the 8th article [“Are Transactions Isolated or Not?”], there doesn’t seem to be any problem.

Because all three queries have the for update clause, they are current reads. The rule for current reads is to see the latest values of all committed records. Also, the two statements of session B and session C are committed after execution, so Q2 and Q3 should see the effects of these two transactions, and indeed they do. This is consistent with the transaction visibility rules.

However, is there really no problem here?

No, there is indeed a problem.

What are the problems with phantom reads? #

First is the semantic problem. Session A declared at T1, “I want to lock all rows with d=5 and disallow other transactions from reading and writing to them”. But in reality, this semantic is violated.

If this is not clear when viewed in the current context, let me include one more SQL statement in session B and session C, and see what happens.

Figure 2: Assuming only row-level lock is added on line id=5 - semantic is violated

In the second statement of session B, “update t set c=5 where id=0”, the semantic is “I changed the c value of the row with id=0 and d=5 to 5”.

Since at time T1, session A only added a row-level lock to the line with id=5, and did not lock the line with id=0, session B can execute these two update statements at time T2. This violates the locking declaration in the Q1 statement of session A, which states that all rows with d=5 should be locked.

The same reasoning applies to session C, where the modification of the line with id=1 also violates the locking declaration in Q1.

Next, let’s consider the issue of data consistency.

We know that locks are designed to ensure data consistency. This consistency not only refers to the internal data state of the database at the current moment, but also includes the logical consistency between the data and the log.

To illustrate this issue, let’s add another update statement to session A at time T1: “update t set d=100 where d=5”.

Figure 3: Assuming only row-level lock is added on line id=5 - data consistency issue

The lock semantics of the update statement are consistent with “select…for update”, so it is reasonable to add this update statement. By declaring “lock the statement with d=5” in session A, we intend to update the data. The newly added update statement modifies the d value of the locked row to 100.

Now, let’s analyze the resulting state of the database after Figure 3 is executed.

At time T1, the row with id=5 becomes (5,5,100), and this result is formally committed at time T6.
At time T2, the row with id=0 becomes (0,5,5).
At time T4, a new row (1,5,5) is added to the table.
Other rows remain unchanged and are not affected by this execution sequence.

From this perspective, there is no issue with this data. However, let’s take a look at the content in the binlog at this point.

At time T2, session B commits the transaction and writes two statements.
At time T4, session C commits the transaction and writes two statements.
At time T6, session A commits the transaction and writes the statement “update t set d=100 where d=5”.

If we put them together, the result is as follows:

update t set d=5 where id=0; /*(0,0,5)*/
update t set c=5 where id=0; /*(0,5,5)*/

insert into t values(1,1,5); /*(1,1,5)*/
update t set c=5 where id=1; /*(1,5,5)*/

update t set d=100 where d=5;/* Modify the d value of all rows with d=5 to 100 */

You should be able to see the problem. Whether executing this statement sequence on a standby database or using the binlog to clone a database in the future, the results of these three rows will become (0,5,100), (1,5,100), and (5,5,100).

In other words, the lines with id=0 and id=1 have become inconsistent. This is a serious problem and is unacceptable.

At this point, let’s review how this data inconsistency was introduced.

From our analysis, we can see that this is caused by assuming that only the row with d=5, i.e. the line with id=5, is locked when executing the statement “select * from t where d=5 for update”.

Therefore, we believe that the above assumption is unreasonable and needs to be changed.

So how do we change it? Let’s add a write lock to all the rows encountered during the scanning process and see the execution results.

Figure 4: Assuming all scanned rows are locked

Since session A has added write locks to all rows, session B will be locked when executing the first update statement. It needs to wait until session A commits at time T6 before it can continue executing.

This way, the final result for the line with id=0 in the database is still (0,5,5). In the binlog, the execution sequence is as follows:

insert into t values(1,1,5); /*(1,1,5)*/
update t set c=5 where id=1; /*(1,5,5)*/

update t set d=100 where d=5;/* Modify the d value of all rows with d=5 to 100 */

```sql
update t set d=5 where id=0; /*(0,0,5)*/
update t set c=5 where id=0; /*(0,5,5)*/

From the above code, we can see that according to the order of the logs, the final result of the row with id=0 is (0,5,5). Therefore, the issue with this row has been resolved.

However, we can also see that for the row with id=1, the result in the database is (1,5,5), but according to the binlog execution result, it is (1,5,100). This means that the issue of phantom reads is still not solved. Why can’t we prevent the insertion and update of the row with id=1 even though we have locked all the records?

The reason is simple. At time T3, when we locked all the rows, the row with id=1 does not exist yet. If it doesn’t exist, it cannot be locked.

In other words, even if we lock all the records, we still cannot prevent the insertion of new records. This is why “phantom reads” is addressed separately.

So far, we have just explained the definition of “phantom reads” and its problems.

Next, let’s take a look at how InnoDB addresses the issue of phantom reads.

How to Solve Phantom Reads? #

Now you know that the cause of phantom reads is that row locks can only lock rows, but the action of inserting new records updates the “gaps” between the records. Therefore, in order to solve the problem of phantom reads, InnoDB has to introduce a new type of lock, which is called a gap lock.

As the name suggests, a gap lock is a lock on the gap between two values. For example, in the table t mentioned at the beginning of the article, when 6 records are initially inserted, there will be 7 gaps.

Figure 5: Row locks and gap locks on the primary key index of table t

In this way, when you execute select * from t where d=5 for update, not only are row locks added to the existing 6 records in the database, but also gap locks are added to the 7 gaps. This ensures that no new records can be inserted.

In other words, at this point, in the process of scanning row by row, not only are row locks added to the rows, but also gap locks are added to the gaps on both sides of the rows.

Now you understand that data rows can be locked entities, and the gaps between data rows can also be locked entities. However, gap locks are different from the locks we have encountered before.

For example, row locks can be divided into read locks and write locks. The following diagram shows the conflict relationship between these two types of row locks.

Figure 6: Conflict relationship between two types of row locks

In other words, the locks that conflict with row locks are “another row lock”.

However, the situation is different with gap locks. The operation of inserting a record into a gap is the one that conflicts with a gap lock. Gap locks do not conflict with each other.

This statement may be difficult to understand, so let me give you an example:

Figure 7: Gap locks do not lock each other

Here, session B will not be blocked. Because there is no record with c=7 in table t, session A has added the gap lock (5,10). And session B has also added a gap lock within this gap. They share a common goal, which is to protect this gap from inserting values. However, they do not conflict with each other.

Gap locks and row locks together are called next-key locks, and each next-key lock is a half-open interval. In other words, after initializing table t, if you use select * from t for update to lock all the records in the table, there will be 7 next-key locks, namely (-∞,0], (0,5], (5,10], (10,15], (15,20], (20, 25], (25, +supremum].

Note: In this article, unless otherwise specified, we treat gap locks as open intervals and next-key locks as half-open intervals.

You may ask, where does this supremum come from?

This is because +∞ is an open interval. In practice, InnoDB adds a non-existent maximum value called supremum to each index, which conforms to our previous statement of “half-open intervals”.

The introduction of gap locks and next-key locks helps us solve the problem of phantom reads, but at the same time, it brings some “troubles”.

Some readers have raised this issue in the previous articles. Let me restate their question using the example table in this article. The business logic is as follows: lock any row, if this row does not exist, insert it; if it exists, update its data. The code is as follows:

begin;
select * from t where id=N for update;

/* If the row does not exist */
insert into t values(N,N,N);
/* If the row exists */
update t set d=N, set id=N;

commit;

You may say, can’t this be solved by using insert ... on duplicate key update? However, in the case of multiple unique keys, this method cannot satisfy the needs of the reader who asked the question. I will explain why in more detail in future articles. Now, let’s discuss this logic.

The phenomenon encountered by this student is that once there is concurrency, a deadlock will occur in this logic. You must also find it strange that this logic is already using the most strict mode by locking it with “for update” before each operation. So how can there still be a deadlock?

Here, I use two sessions to simulate concurrency and assume N=9.

Figure 8 Deadlock caused by gap lock

As you can see, it doesn’t even require the update statement at the end to form a deadlock. Let’s analyze the statements in the order they are executed:

Session A executes the select … for update statement. Since the row with id=9 does not exist, a gap lock (5,10) is added.
Session B executes the select … for update statement, which also adds a gap lock (5,10). Gap locks do not conflict with each other, so this statement can be executed successfully.
Session B tries to insert a row (9,9,9), but is blocked by the gap lock from session A and has to wait.
Session A tries to insert a row (9,9,9), but is blocked by the gap lock from session B.

At this point, both sessions enter a mutually waiting state, forming a deadlock. Of course, InnoDB’s deadlock detection immediately detects this deadlock relationship, causing session A’s insert statement to return with an error.

Now you know that the introduction of gap locks may cause the same statement to lock a larger scope, which actually affects concurrency. This is just a simple example. In the next article, we will encounter more complex examples.

You may say, in order to solve the problem of phantom reads, we introduced such a long list of content. Is there a simpler way to handle it?

As I mentioned at the beginning of the article, if not specifically stated, the problems we analyzed today are all under the repeatable read isolation level, and gap locks only take effect under the repeatable read isolation level. So if you set the isolation level to read committed, there will be no gap locks. But at the same time, if you want to solve the possible inconsistency between data and logs, you need to set the binlog format to row. This is also a configuration combination used by many companies.

Some students in the comments of the previous article mentioned that their company uses the combination of read committed isolation level and binlog_format=row. He once asked the DBA of their company why they configured it this way. The DBA replied directly, “Because everyone is using it.”

So this student asked in the comments, is this configuration reasonable or not?

The answer to this question itself is, if the read committed isolation level is sufficient, that is, the business does not require the guarantee of repeatable read, then considering that the lock scope for data operations under read committed is smaller (no gap locks), this choice is reasonable.

But what I want to say is that the reasonableness of the configuration depends on the business scenario and needs to be analyzed based on specific problems.

However, if the reason the DBA thinks they should use it is “because everyone is using it”, then there is a problem, or in other words, problems will arise sooner or later. For example, everyone uses “READ COMMITTED” for regular operations, but when it comes to logical backups, why does mysqldump set the backup thread to “REPEATABLE READ”? (I have explained this in a previous article, you can review the content of the 6th article [《Global Lock and Table Lock: Why is adding a field to a table so obstructive?》])

During the backup, the backup thread uses “REPEATABLE READ”, while the business thread uses “READ COMMITTED”. Two different transaction isolation levels coexist, is there any problem?

Furthermore, what is the difference between these two different isolation levels, and how did we come to the conclusion that “READ COMMITTED is enough” for our business?

If the business development and operations teams have not understood these issues, then the conclusion of “no problem” is itself a problem.

Summary #

Today, starting from the post-discussion questions of the previous article, we mentioned the locking method for full table scans. We found that even if row-level locks are applied to all rows, it is still impossible to solve the phantom read problem, so the concept of gap locks is introduced.

I have encountered many business development personnel who have a certain understanding of row-level locks when designing data table structure and business SQL statements, but they rarely consider gap locks. As a result, there are often deadlock incidents caused by gap locks on production databases.

Row-level locks are indeed more intuitive, and their judgment rules are relatively simple. The introduction of gap locks will affect system concurrency and increase the complexity of lock analysis, but there are rules to follow. In the next article, I will explain the locking rules of InnoDB to help you understand the “rules” behind them.

As a preview of the next article, I leave you with a thought question.

Figure 9 Transaction enters lock wait state

If you have not previously understood the relevant content of this article, you must find these three statements unrelated. But in fact, both the insert statements from session B and session C will enter the lock wait state.

You can try to analyze the reason for this situation.

It should be noted that this question can only be answered after I introduce the locking rules in the next article. It is left for you to preview, and the analysis of session C being locked is a bit difficult. If you cannot analyze it, don’t be discouraged, I will explain it in detail in the next article.

You can also share what isolation level your online MySQL configuration is and why it is configured that way. Have you encountered any situations where the “REPEATABLE READ” isolation level is necessary?

You can write about the scenarios you have encountered and your analysis in the comments section. In the next article, I will select interesting comments to share and analyze with everyone. Thank you for listening, and feel free to share this article with more friends to read together.

Question from the previous article #

We answered the previous article’s question at the beginning of this article. Some of the answers from the students also mentioned that under the “READ COMMITTED” isolation level, after the statement is executed, only row-level locks are present. And after the statement is executed, InnoDB will remove the row-level locks that do not meet the conditions.

Of course, the row-level lock for the row “c=5” will still be released when committing.