08 Transactions Are They Isolated or Not #

Hello, I’m Lin Xiaobin. The article you are currently reading is a rewrite. After the first version of the article was published, I found it difficult to understand the visibility rules of transactions due to the introduction of too many concepts. Therefore, I decided to rewrite this article. In the comments from users, you can still see the concept of up_limit_id introduced in the first version of the article. To avoid misunderstandings, I want to explain it in advance.

In the third article, when I talked about transaction isolation levels, I mentioned that in the repeatable read isolation level, when a transaction T starts, it creates a read-view. During the execution of transaction T, even if other transactions modify the data, transaction T will still see the data as it was when T started. In other words, a transaction executed under the repeatable read isolation level seems to be unaffected by external changes.

However, in the previous article, when I discussed row locks, I mentioned that when a transaction wants to update a row, if another transaction already holds a lock on that row, it can’t simply ignore the lock and continues. Instead, it gets blocked and goes into a waiting state. The question is, since it’s in a waiting state, what value will it read when it finally acquires the lock and updates the data?

Let me give you an example. The following is the initialization statement for a table with only two rows.

mysql> CREATE TABLE `t` (
  `id` int(11) NOT NULL,
  `k` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;
insert into t(id, k) values(1,1),(2,2);

Figure 1 Execution flow of transactions A, B, and C.

Here, we need to pay attention to the timing of transaction initiation.

The BEGIN or START TRANSACTION commands are not the starting point of a transaction. The transaction only starts when the first statement that modifies an InnoDB table is executed after these commands. If you want to start a transaction immediately, you can use the START TRANSACTION WITH CONSISTENT SNAPSHOT command.

With the first method of starting a transaction, the consistent view is created when the first snapshot read statement is executed; with the second method of starting a transaction, the consistent view is created when the START TRANSACTION WITH CONSISTENT SNAPSHOT is executed.

Also, note that in this whole column, if not specified otherwise, our examples assume autocommit=1 by default.

In this example, transaction C does not explicitly use BEGIN or COMMIT, which means that the UPDATE statement itself is a transaction and it will be automatically committed when the statement is completed. Transaction B queries after updating a row, and transaction A queries within a read-only transaction, but after transaction B’s query.

Now, if I tell you that transaction B retrieves the value of k as 3, while transaction A retrieves the value of k as 1, do you feel a bit confused?

So, in today’s article, I want to explain this problem to you and hopefully, by clarifying this confusion, help you further understand InnoDB transactions and locks.

In MySQL, there are two concepts of “views”:

One is a view. It is a virtual table defined by a query statement. When called, the query statement is executed and the result is generated. The syntax to create a view is CREATE VIEW ..., and it is queried in the same way as a table.
The other is a consistent read view, used by InnoDB to implement MVCC (Multiversion Concurrency Control). It is used to support the RC (Read Committed) and RR (Repeatable Read) isolation levels.

The consistent read view has no physical structure but is used during the execution of a transaction to define “what data can I see”.

In the third article [“Transaction Isolation: Why Can’t I See the Changes You Made?”], I explained the implementation logic of MVCC to you. Today, in order to explain the difference between queries and updates, I will demonstrate it in a different way by breaking down the read view. You can combine the explanations in these two articles to further understand MVCC.

How does the “snapshot” work in MVCC? #

Under the repeatable read isolation level, when a transaction starts, it takes a “snapshot”. Note that this snapshot is based on the entire database.

You may think that this doesn’t seem practical. If a database is 100GB, then when I start a transaction, MySQL has to copy 100GB of data, which must be very slow. But in reality, we don’t need to copy all this 100GB of data. Let’s take a look at how this snapshot is implemented.

In InnoDB, each transaction has a unique transaction ID, called the transaction id. It is obtained from InnoDB’s transaction system when the transaction starts and is strictly incremented in the order of application.

Each row of data also has multiple versions. Every time a transaction updates data, a new version is generated, and the transaction ID is assigned to this new version as the row trx_id. At the same time, the old version of the data is retained and the new version has information to directly access it.

In other words, a single row of records in a data table may have multiple versions (rows), each with its own row trx_id.

The following is an example of the state of a record after being continuously updated by multiple transactions, as shown in Figure 2.

Figure 2 Row status change diagram.

Inside the dashed box are 4 versions of the same row. The current latest version is V4 with a value of 22 for k. It was updated by a transaction with transaction id 25, so its row trx_id is also 25. You might ask, didn’t the previous article say that statement updates would generate undo logs? So, where is the undo log?

In fact, the three dashed arrows in Figure 2 are the undo logs; V1, V2, and V3 do not physically exist, but are calculated based on the current version and undo logs whenever needed. For example, when V2 is needed, it is calculated by executing U3 and U2 in sequence based on V4.

Now that we understand the concepts of multi-versioning and row trx_id, let’s think about how InnoDB defines the “100G” snapshot.

According to the definition of repeatable read, when a transaction starts, it should be able to see the results of all committed transactions. However, updates made by other transactions during the execution of this transaction should not be visible to it.

Therefore, a transaction only needs to declare at the beginning, “Based on the time I started, if a data version was generated before I started, I will consider it; if it was generated after I started, I will not consider it, and I must find its previous version.”

Of course, if the “previous version” is also not visible, then we need to continue to look for it. Also, if the transaction itself updates the data, it should still consider it.

In terms of implementation, InnoDB constructs an array for each transaction to store all transaction IDs that were “active” at the moment the transaction started. “Active” means that a transaction has started but has not yet been committed.

The minimum value of transaction IDs in the array is called the low watermark, and the maximum value plus one of transaction IDs ever created in the system is called the high watermark.

This view array and the high watermark constitute the consistent view (read-view) of the current transaction.

The visibility rule for data versions is based on the comparison between the row trx_id of the data and this consistent view.

This view array divides all row trx_id of the data into several different cases.

Figure 3 Visibility rule for data versions

Therefore, for the moment a transaction starts, a row trx_id of a data version can fall into the following categories:

If it falls into the green area, it means that this version was generated by a committed transaction or the current transaction itself, and this data is visible.
If it falls into the red area, it means that this version was generated by a transaction that will start in the future, so it is definitely not visible.
If it falls into the yellow area, it includes two cases: a. If the row trx_id is in the array, it means that this version was generated by a transaction that has not been committed yet, so it is not visible; b. If the row trx_id is not in the array, it means that this version was generated by a committed transaction, so it is visible.

For example, for the data in Figure 2, if there is a transaction with a low watermark of 18, when it accesses this row of data, it will calculate V3 from V4 through U3, so it sees the value of 11 for this row.

You see, with this declaration, the subsequent updates that occur in the system are independent of what this transaction sees, right? Because the versions generated by the subsequent updates must fall into the categories 2 or 3(a) mentioned above, and for this transaction, these new data versions do not exist, so the snapshot of this transaction becomes “static”.

So now you know that InnoDB utilizes the feature of “all data having multiple versions” to achieve the ability to create snapshots in seconds.

Next, let’s continue to look at the three transactions in Figure 1 and analyze why transaction A’s query statement returns k=1.

Here, let’s make the following assumptions:

Before transaction A starts, the only active transaction ID in the system is 99.
The version numbers of transactions A, B, and C are 100, 101, and 102 respectively, and there are only these four transactions in the system.
Before the start of the three transactions, the row trx_id of data (1,1) is 90.

In this case, transaction A’s view array is [99, 100], transaction B’s view array is [99, 100, 101], and transaction C’s view array is [99, 100, 101, 102].

To simplify the analysis, let’s remove other irrelevant statements and only show the operations related to transaction A’s query logic:

Figure 4 Transaction A query data logic diagram

From the diagram, we can see that the first effective update is done by transaction C, changing the data from (1,1) to (1,2). At this time, the latest version’s row trx_id of this data is 102, and the version 90 has become a historical version.

The second effective update is done by transaction B, changing the data from (1,2) to (1,3). At this time, the latest version (i.e. row trx_id) of this data is 101, and 102 has become a historical version again.

You may have noticed that when transaction A is performing the query, transaction B has not been committed yet, but the version (1,3) it generated has already become the current version. However, this version must be invisible to transaction A, otherwise it would be a dirty read.

Now, transaction A is about to read the data, and its view array is [99, 100]. Of course, reading data always starts from the current version. So, the data reading process of transaction A’s query statement is as follows:

When it comes to (1,3), it determines that the row trx_id is 101, which is larger than the high watermark, so it falls into the red area and is not visible.
Then, when it looks for the previous historical version, it finds row trx_id=102, which is larger than the high watermark and falls into the red area, so it is not visible.
It continues to look back and finally finds (1,1), with row trx_id=90, which is smaller than the low watermark and falls into the green area, so it is visible. When executing in this way, although this row of data has been modified during the process, the result of the query for transaction A is always consistent whenever it is queried. Therefore, we call this consistent reading.

This judgment rule is directly transcribed from the code logic, but as you can see, it is difficult to analyze visibility manually.

So, let me translate it for you. For a data version, there are three situations for a transaction view, apart from its own updates always being visible:

If the version is not committed, it is not visible.
If the version is committed but is committed after the creation of the view, it is not visible.
If the version is committed and is committed before the creation of the view, it is visible.

Now, let’s use this rule to judge the query result in Figure 4. The array of views for transaction A’s query statement is generated when the transaction A is started, at which point:

(1,3) has not been committed yet, so it belongs to situation 1 and is not visible.
(1,2) has been committed, but it was committed after the creation of the view array, so it belongs to situation 2 and is not visible.
(1,1) was committed before the creation of the view array, so it belongs to situation 3 and is visible.

You see, by removing the numeric comparison and using only the chronological order to judge, the analysis becomes much easier. So, we will use this rule for analysis from now on.

Update Logic #

Attentive students may have questions: For the update statement of transaction B, if we follow the consistent read, it seems that the result is not correct, right?

As you can see in Figure 5, the view array for transaction B is generated first, and then transaction C is committed. Shouldn’t we not be able to see (1,2)? How can we calculate (1,3)?

Figure 5: Update logic diagram for transaction B

Yes, if transaction B queries the data before the update, the value of k returned from this query is indeed 1.

However, when it wants to update the data, it cannot update on the historical version anymore, otherwise transaction C’s update will be lost. Therefore, at this point, the set k=k+1 for transaction B is based on (1,2).

So, here’s where this rule applies: Data updates always involve a read before the write, and this read can only read the current value, known as a “current read”.

Therefore, when updating, the current read retrieves the data (1,2), and after updating, a new version of the data (1,3) is created, with the row trx_id being 101.

So, when executing the query statement of transaction B, since the current version number is 101 and the version number of the newest data is also 101, which is its own update, it can be directly used in the query. Therefore, the value of k returned from the query is 3.

Here we mentioned a concept called a current read. In fact, in addition to update statements, if select statements are locked, they are also current reads.

So, if we modify the query statement of transaction A, “select * from t where id=1” with “lock in share mode” or “for update”, we can also read the data with version number 101, and the value of k returned will be 3. The following two select statements each add a read lock (S lock, shared lock) and a write lock (X lock, exclusive lock), respectively.

mysql> select k from t where id=1 lock in share mode;
mysql> select k from t where id=1 for update;

Taking one step further, what if transaction C doesn’t commit right away, and instead becomes the following transaction C’? What will happen?

Figure 6: Execution flow of transactions A, B, and C'

The difference with transaction C’ is that after updating, it is not immediately committed. Before it is committed, transaction B’s update statement is initiated. As mentioned earlier, even though transaction C’ has not been committed, the version (1,2) has been generated and it is the current latest version. So how will transaction B handle the update statement?

At this point, we need to bring in the “two-phase locking protocol” mentioned in the previous article. Transaction C’ has not been committed, which means the write lock on the version (1,2) has not been released. Transaction B is a current read and must read the latest version, and it must be locked. Therefore, it is locked and must wait until transaction C’ releases this lock before it can continue with its current read.

Figure 7: Update logic diagram for transaction B (with transaction C')

At this point, we have connected consistent reading, current reading, and row-level locks.

Now, let’s go back to the question at the beginning of the article: How is the ability of transactions to have repeatable reads achieved? The core of repeatable read is consistent read, while transactional updates can only use current read. If the row lock of the current record is occupied by another transaction, it needs to enter lock waiting.

The logic of read committed is similar to that of repeatable read, but the main differences are:

Under the repeatable read isolation level, a consistent snapshot is created at the beginning of the transaction, and other queries in the transaction share this consistent snapshot.
Under the read committed isolation level, a new snapshot is calculated before each statement is executed.

So, let’s take a look at what values transaction A and transaction B should get when querying k under the read committed isolation level.

Here, it should be noted that the meaning of “start transaction with consistent snapshot;” is to create a consistent snapshot that lasts for the entire transaction from this statement. Therefore, in the read committed isolation level, this usage is meaningless and is equivalent to a regular start transaction.

The following is the state diagram for read committed. It can be seen that the timing of creating the view array for these two queries has changed, which is the read view box in the diagram. (Note: Here, we are still using the logic of transaction C directly committing, not transaction C')

Figure 8. Transaction state diagram under read committed isolation level

At this point, the view array of transaction A’s query statement is created when executing this statement, and the generation time of (1,2) and (1,3) is before the moment of creating this view array. However, at this moment:

(1,3) has not been committed, belongs to case 1, and is not visible.
(1,2) has been committed, belongs to case 3, and is visible.

Therefore, at this time, the query statement of transaction A returns k=2.

Obviously, the query result of transaction B is k=3.

Conclusion #

InnoDB row data has multiple versions, and each data version has its own row trx_id. Each transaction or statement has its own consistent view. Normal query statements are consistent reads, and consistent reads determine the visibility of data versions based on row trx_id and consistent view.

For repeatable read, only data committed before the start of the transaction is recognized in the query.
For read committed, only data committed before the start of the statement is recognized in the query.

Current read always reads the latest version that has been committed.

You can also think about why table structure does not support “repeatable read”. This is because table structure does not have corresponding row data or row trx_id, so it can only follow the logic of current read.

Of course, in MySQL 8.0, you can already put table structure in the InnoDB dictionary, so in the future, repeatable read for table structure may be supported.

Now it’s time for the question. I use the table structure and initialization statement below as the experimental environment, and the transaction isolation level is repeatable read. Now, I want to set the c value of all rows where “field c is equal to id” to zero, but I have encountered a “strange” situation that cannot be changed. Please create the situation and explain its principle.

mysql> CREATE TABLE t ( id int(11) NOT NULL, c int(11) DEFAULT NULL, PRIMARY KEY (id) ) ENGINE=InnoDB; insert into t(id, c) values(1,1),(2,2),(3,3),(4,4);

After reproducing it, please think about whether this situation may occur in actual business development. Will your application code fall into this “pit”, and how did you solve it?

You can write your thoughts and opinions in the comments section, and I will discuss this issue with you at the end of the next article. Thank you for listening, and feel free to share this article with more friends to read together.

Last question #

In the last article, I left you with a question: How to delete the first 10,000 rows of a table. Many comments chose the second method, which is to execute “delete from T limit 500” 20 times in a loop.

Indeed, the second method is relatively better.

In the first method (i.e., directly executing “delete from T limit 10000”), the individual statement takes a long time, and the lock time is also relatively long; moreover, a large transaction can cause master-slave delay.

The third method (i.e., executing “delete from T limit 500” in 20 connections simultaneously) artificially causes lock conflicts.