03 Transaction Isolation Why Changes are Invisible to Me #

When it comes to transactions, you are definitely familiar with them. When dealing with databases, we always use transactions. The most classic example is transferring money. Let’s say you want to transfer $100 to your friend, Xiao Wang, and at that moment, you only have $100 in your bank account.

The process of transferring money involves a series of operations in the program, such as checking the balance, performing addition and subtraction, updating the balance, etc. These operations must be atomic. Otherwise, while the program is checking your balance and before it subtracts the $100, you can quickly check your balance again and transfer money to another friend. If the bank allows this, wouldn’t it be chaotic? This is where the concept of “transactions” comes into play.

Simply put, a transaction is a group of database operations that must either succeed as a whole or fail as a whole. In MySQL, transaction support is implemented at the engine level. As you may know, MySQL is a system that supports multiple engines, but not all engines support transactions. For example, the native MyISAM engine in MySQL does not support transactions. This is one of the important reasons why MyISAM has been replaced by InnoDB.

In today’s article, I will take InnoDB as an example to analyze the specific implementation of MySQL’s transaction support and provide corresponding practical recommendations based on the principles. I hope these examples can deepen your understanding of MySQL’s transaction principles.

Isolation and Isolation Levels #

When it comes to transactions, you will definitely think of ACID (Atomicity, Consistency, Isolation, Durability). Today, let’s talk about the “Isolation” part, denoted by the letter “I”.

When multiple transactions are executing simultaneously on a database, problems such as dirty reads, non-repeatable reads, and phantom reads may occur. To solve these problems, the concept of “isolation level” was introduced.

Before discussing isolation levels, you need to know that the more tightly you isolate, the lower the efficiency will be. Therefore, often times, we need to find a balance between the two. The transaction isolation levels specified by the SQL standard include: read uncommitted, read committed, repeatable read, and serializable. Let me explain them one by one:

Read Uncommitted: With this isolation level, the changes made by a transaction can be seen by other transactions even before it is committed.
Read Committed: With this isolation level, the changes made by a transaction will only be visible to other transactions after it is committed.
Repeatable Read: With this isolation level, the data seen during the execution of a transaction will always be consistent with the data seen when the transaction started. In the repeatable read isolation level, uncommitted changes are also invisible to other transactions.
Serializable: As the name suggests, for the same row of records, a “write” operation will acquire a “write lock”, and a “read” operation will acquire a “read lock”. When there is a conflict between read and write locks, the later transaction must wait for the previous transaction to complete before it can continue.

Among them, “read committed” and “repeatable read” are somewhat difficult to understand, so let me use an example to explain these isolation levels. Let’s assume that the table T only has one column, and one of the rows has a value of 1. The following describes the behavior of two transactions executed in chronological order:

mysql> create table T(c int) engine=InnoDB;
insert into T(c) values(1);

Let’s take a look at the different results for transaction A at different isolation levels, that is, what are the return values of V1, V2, and V3 in the figure.

If the isolation level is “Read Uncommitted”, then the value of V1 is 2. At this time, transaction B has not been committed yet, but the result has already been seen by A. Therefore, V2 and V3 are also 2.
If the isolation level is “Read Committed”, then V1 is 1 and the value of V2 is 2. The update by transaction B can only be seen by A after it is committed. Therefore, the value of V3 is also 2.
If the isolation level is “Repeatable Read”, then V1 and V2 are both 1, and V3 is 2. The reason why V2 is still 1 follows this requirement: the data seen by a transaction during its execution must be consistent.
If the isolation level is “Serializable”, when transaction B executes “change 1 to 2”, it will be locked. Transaction B can continue to execute only after transaction A is committed. Therefore, from A’s perspective, the values of V1 and V2 are both 1, and the value of V3 is 2.

In implementation, a view is created in the database, and the logical result of the view is used when accessing it. Under the “Repeatable Read” isolation level, this view is created when the transaction starts and is used throughout the transaction. Under the “Read Committed” isolation level, this view is created at the beginning of each SQL statement execution. It should be noted that under the “Read Uncommitted” isolation level, the latest value on the record is directly returned without the concept of a view, while under the “Serializable” isolation level, locking is used to prevent concurrent access.

We can see that the database behavior is different under different isolation levels. The default isolation level of Oracle database is actually “Read Committed”. Therefore, for some applications migrating from Oracle to MySQL, in order to ensure consistency of database isolation level, you must remember to set the isolation level of MySQL to “Read Committed”.

The configuration is to set the value of the startup parameter “transaction-isolation” to “READ-COMMITTED”. You can use the “show variables” command to view the current value.

mysql> show variables like 'transaction_isolation';
 
+-----------------------+----------------+
 
| Variable_name | Value |
 
+-----------------------+----------------+
 
| transaction_isolation | READ-COMMITTED |
 
+-----------------------+----------------+

In summary, existence is reasonable, and each isolation level has its own use cases. You need to determine the isolation level based on your specific business situation. You may be wondering when you would need the “Repeatable Read” isolation level. Let’s look at an example of data reconciliation logic.

Suppose you are managing a personal bank account table. One table stores the end-of-month balances for each month, and another table stores the details of the bills. Now you need to reconcile the data, which means checking if the difference between the previous month’s balance and the current balance is consistent with the bill details for this month. You would definitely want the reconciliation process to be unaffected by new transactions even if they occur while you are reconciling.

In this case, using the “Repeatable Read” isolation level is very convenient. The view at the start of the transaction can be considered static and is not affected by updates from other transactions.

Implementation of Transaction Isolation #

After understanding the isolation levels of transactions, let’s take a look at how transaction isolation is actually implemented. Here, we will explain “repeatable read” in detail.

In MySQL, every record is actually accompanied by a rollback operation when it is updated. The latest value on the record can be retrieved by performing rollback operations to obtain the value from the previous state.

Let’s assume that a value is sequentially changed from 1 to 2, 3, 4. The rollback log will have records similar to the following:

The current value is 4, but when querying this record, transactions started at different times will have different read-views. As shown in the figure, in read-views A, B, and C, the value of this record is 1, 2, and 4, respectively. Multiple versions of the same record can exist in the system, which is known as multi-version concurrency control (MVCC) in the database. To obtain 1 in read-view A, you need to perform all the rollback operations shown in the figure.

At the same time, you will notice that even if there is another transaction currently changing 4 to 5, this transaction will not conflict with the transactions corresponding to read-views A, B, and C.

You may wonder, the rollback log cannot be kept forever, so when will it be deleted? The answer is, it will be deleted when it is no longer needed. In other words, the system will determine that the rollback log can be deleted when no transaction needs to use it anymore.

When is it no longer needed? It is when there is no earlier read-view in the system than this rollback log.

Based on the explanation above, let’s discuss why it is recommended to avoid using long transactions.

Long transactions mean that there will be very old transaction views in the system. Since these transactions can access any data in the database at any time, the rollback records that may be needed by these transactions must be kept until the transaction is committed. This will result in a large amount of storage usage.

In MySQL 5.5 and earlier versions, the rollback log was stored together with the data dictionary in the ibdata file. Even if the rollback segments are cleaned up after long transactions are committed, the file size will not decrease. I have seen cases where the data only occupies 20GB, but the rollback segments occupy 200GB. Eventually, the entire database had to be rebuilt in order to clean up the rollback segments.

In addition to the impact on rollback segments, long transactions also occupy lock resources and may bring down the entire database, which we will discuss further when we talk about locks later.

Transaction Startup Modes #

As mentioned earlier, long transactions have these potential risks, so I would definitely recommend that you try to avoid them. In fact, many times, business developers do not intentionally use long transactions, usually it is due to misuse. MySQL has the following transaction startup modes:

Explicitly start a transaction with the statements begin or start transaction. The corresponding commit statement is commit, and the rollback statement is rollback.
Set autocommit=0, which turns off the auto-commit for this thread. It means that if you only execute a select statement, a transaction is started and it will not be automatically committed. This transaction will continue to exist until you explicitly execute a commit or rollback statement, or disconnect.

Some client connection frameworks will execute a set autocommit=0 command by default after a successful connection. This leads to the following queries being executed within a transaction. If it is a long connection, it will result in unexpected long transactions.

Therefore, I would suggest always using set autocommit=1 to start transactions explicitly using statements.

However, some developers may worry about the “extra interaction”. For a business that frequently uses transactions, the second method does not require manually executing begin every time a transaction starts, reducing the number of statement interactions. If you also have this concern, I suggest using the commit work and chain syntax.

With autocommit set to 1, if a transaction is explicitly started with begin and a commit is executed, the transaction is committed. If commit work and chain is executed, the transaction is committed and the next transaction is automatically started, saving the cost of executing begin again. The benefit is that from the perspective of program development, it is clear whether each statement is within a transaction or not.

You can query long transactions in the innodb_trx table in the information_schema database. For example, the following statement is used to find transactions that have been active for more than 60 seconds:

select * from information_schema.innodb_trx where TIME_TO_SEC(timediff(now(),trx_started))>60

Summary #

In this article, I introduced the phenomena and implementation of transaction isolation levels in MySQL. Based on the implementation principle, I analyzed the risks associated with long transactions and how to avoid them using the correct approach. I hope the examples I provided can help you understand transactions and better utilize the transaction features of MySQL.

I’ll leave you with a question. Now that you know that long transactions should be avoided in the system, if you are both the business development manager and the database manager, what solutions would you have to prevent or handle this situation?

You can write your thoughts and opinions in the comments section, and I will discuss this question with you at the end of the next article. Thank you for listening, and feel free to share this article with more friends to read together.

Previous Question #

At the end of the previous article, I left you with a question comparing daily backups and weekly backups.

The advantage of daily backups is that the “maximum recovery time” is shorter.

In the daily backup mode, in the worst case, you would need to apply a day’s worth of binlogs. For example, if you perform a full backup every day at 00:00 and want to recover to a backup from yesterday at 23:00.

For weekly backups, in the worst case, you would need to apply a week’s worth of binlogs.

The corresponding indicator for the system is the Recovery Time Objective (RTO) mentioned by @尼古拉斯·赵四 and @慕塔.

Of course, this comes with a cost because more frequent full backups consume more storage space. Therefore, this RTO is achieved at the cost of considering the importance to the business.