33 My SQL Tuning Transaction Optimization in High Concurrency Scenarios

33 MySQL Tuning Transaction optimization in high concurrency scenarios #

Hello, I’m Liu Chao.

A database transaction is a logical unit of work in a database system that guarantees that a database operation is either successful or unsuccessful. When it comes to transactions, we must mention the ACID properties. Database transactions have four basic properties: Atomicity, Consistency, Isolation, and Durability. It is these properties that ensure the security of database transactions. In MySQL, since the MyISAM storage engine does not support transactions, the following content will be explained based on the InnoDB storage engine.

We know that in Java concurrent programming, programs can be executed concurrently by multiple threads. However, while concurrency improves the efficiency of program execution, it brings thread safety issues to the program. Just like multithreading, in order to improve the throughput of database transaction processing, databases also support concurrent transactions. However, in concurrent execution, there are also security issues such as data loss during modification and inconsistent data during reading.

In database transactions, the isolation of transactions is the key to solving concurrency transaction problems. Today, we will focus on understanding the implementation principles of transaction isolation and how to optimize the performance issues caused by transaction isolation.

Issues with Concurrent Transactions #

We can understand several issues caused by concurrent transactions through the following examples:

  1. Data Loss

img

  1. Dirty Read

img

  1. Non-Repeatable Read

img

  1. Phantom Read

img

Transaction Isolation and Concurrency Issues Solution #

Among the problems caused by the above four concurrent transactions, data loss can be avoided by using pessimistic locks in the database. This means that a exclusive lock is implemented by using the select xx for update statement in a transaction during querying, ensuring that other transactions cannot update the data until the transaction ends.

Of course, optimistic locks can also be used to avoid data loss. This involves using a version number as a field, and if the version number during an update is the same as the previous version, the update is allowed; otherwise, the update fails. The remaining three problems are actually caused by consistency issues when reading from the database, which require the database to provide certain transaction isolation mechanisms to solve.

By using locking, different transaction isolation mechanisms can be achieved. Before understanding transaction isolation mechanisms, let’s first understand the locking mechanisms supported by MySQL.

InnoDB supports two types of locking mechanisms: shared lock (S) and exclusive lock (X). A shared lock allows a transaction to read data but not modify it. If another transaction wants to lock the same row, it can only acquire a shared lock; an exclusive lock is acquired when modifying data, allowing both reading and modifying of the data. Once a transaction acquires a lock on a row, other transactions cannot acquire any other locks on that data.

With a good understanding of the implementation principles of InnoDB row locks, we can better understand the following content.

Different locking mechanisms in transactional operations will produce several different transaction isolation levels, which correspondingly solve the problems caused by concurrent transactions as follows:

Read Uncommitted: When transaction A reads data, transaction B acquires a shared lock to read and modify the data. This isolation level can result in dirty reads, non-repeatable reads, and phantom reads.

Read Committed: When transaction A reads data, it acquires a shared lock. Once the data is read, the lock is immediately released. When transaction B reads and modifies the data, it acquires a row-level exclusive lock, which is released only when the transaction ends. In other words, when transaction A reads the data, transaction B can only read the data and cannot modify it. After transaction A reads the data, transaction B can modify it. This isolation level can avoid dirty reads, but still has non-repeatable reads and phantom reads.

Repeatable Read: When transaction A reads data, it acquires a shared lock, which is released only when the transaction ends. When transaction B reads and modifies the data, it acquires a row-level exclusive lock, which is released only when the transaction ends. In other words, while transaction A has not ended, transaction B can only read the data and cannot modify it. Transaction B can modify the data only after transaction A ends. This isolation level can avoid dirty reads and non-repeatable reads, but still has phantom reads.

Serializable: When transaction A reads data, it acquires a shared lock, which is released only when the transaction ends. When transaction B reads and modifies the data, it acquires a table-level exclusive lock, which is released only when the transaction ends. Serializable solves problems such as dirty reads, non-repeatable reads, phantom reads, but as the isolation level becomes higher, concurrency becomes lower.

InnoDB’s Read Committed (RC) and Repeatable Read (RR) transaction isolation are implemented using multi-version concurrency control (MVCC) to achieve high-performance transactions. Once data is locked with an exclusive lock, other transactions cannot acquire a shared lock and will be in a blocking/waiting state. If a table has a large number of requests, this kind of performance will not be supported.

MVCC does not lock ordinary Select operations. If the data being read is being deleted or updated, the reading operation does not wait for the exclusive lock to be released. Instead, MVCC directly reads the data snapshot of that row (the data snapshot refers to the previous version of the data for that row, and the version of the data snapshot is based on the implementation of undo, which is used for transaction rollback and records different versions of row records for rollback). MVCC avoids the process of repeatedly locking the data, greatly improving the performance of read operations.

Lock Implementation Algorithms #

We know that InnoDB implements both row-level locks and table-level locks. Row-level locks are implemented through indexing. If data is retrieved without using index conditions, InnoDB will lock all records in the table, effectively upgrading to a table-level lock.

There are three specific algorithms for row-level locks: record lock, gap lock, and next-key lock. Record lock is used to lock index entries specifically. Gap lock is used to lock the gaps between index entries. Next-key lock is a combination of the previous two, locking both index entries and the gaps between them.

Gap lock and next-key lock are only acquired under the repeatable read isolation level or higher, for specific operations. When performing Select, Update, and Delete operations, besides queries based on unique indexes, other index queries will acquire gap lock or next-key lock, effectively locking the range they scan.

Optimizing High Concurrency Transactions #

With the above explanation, I believe you have gained a thorough understanding of transactions, locks, and isolation levels. Now that we have a clear understanding of the problem, let’s discuss how to optimize transactions in high-concurrency scenarios.

1. Combine business scenarios with low-level transaction isolation #

In high-concurrency businesses, different levels of transaction isolation are often used to ensure the consistency of business data. The higher the isolation level, the lower the concurrency performance.

So, how do we determine which isolation level is more appropriate for a specific business scenario? Let’s explain the selection method using two simple examples.

In the scenario where we modify the last login time of a user, there is no strict requirement for the accuracy of querying the user’s login time. And the modification of user login information will only occur when the user logs in, eliminating the possibility of one transaction’s committed information being overwritten. Therefore, we can allow this business to use the lowest isolation level.

However, if it involves consuming the balance or points in an account, where multiple clients may simultaneously consume from the same account, we should choose the “Repeatable Read” isolation level to ensure that once one client consumes from the account, other clients cannot simultaneously consume from it.

2. Avoid escalating row locks to table locks #

As mentioned earlier, in InnoDB, row locks are implemented through indexes. If data is retrieved without index conditions, row locks will be escalated to table locks. We know that table locks seriously affect the performance of operations on the entire table, so we should avoid them.

3. Control the size of transactions to reduce the amount of locked resources and the duration of locking #

Have you ever encountered the following SQL exception? In the log of the panic-buying system, we often see this kind of exception log during the activity period:

MySQLQueryInterruptedException: Query execution was interrupted

Due to the transaction being started in the purchase-submission order, when updating a record in high-concurrency scenarios, because the transaction that updates the record may have other operations as well, a transaction can become quite long. When a large number of requests enter, some requests may enter the transaction simultaneously.

Furthermore, lock competition is unfair. When multiple transactions simultaneously update a record, in extreme cases, an update operation that enters the queuing system may be unable to acquire the lock for a long time and eventually be forcibly aborted by the system due to timeout.

When a user purchases a product, we first need to query the inventory balance, then create an order, and finally deduct the corresponding inventory. These operations are all part of the same transaction.

If the above operations yield the same result regardless of their order, the transaction’s performance will still differ:

img

This is because although these operations are part of the same transaction, the lock acquisition occurs at different times. Locks are only released when all other operations are executed. Since deducting inventory is an update operation and involves row-level locks, it will affect other transactions that operate on the same data. Therefore, we should try to avoid holding this lock for a long time and release it as soon as possible.

Moreover, because the order creation and inventory deduction can occur in any sequence without affecting the business, we can place the inventory deduction operation at the end. In other words, we can use execution order 1 to minimize the holding time of the lock.

Summary #

In fact, the optimization of concurrent transactions in MySQL is very similar to the optimization of multi-threaded programming in Java. Both can be optimized by reducing lock granularity and reducing the holding time of locks. In the optimization of concurrent transactions in MySQL, we try to avoid using high transaction isolation levels in business scenarios where low transaction isolation levels can be used.

In functional business development, developers often use default parameter settings to implement business functions in order to pursue development speed. For example, in the service method, you may be accustomed to using the default transaction and rarely manually changing the transaction isolation level. However, it should be noted that the default transaction isolation level is RR, which may not be suitable for certain business scenarios. Therefore, we still need to consider specific business scenarios.

Reflection Questions #

Up until now, we have mainly covered the transaction isolation provided by locks. Do you know how InnoDB implements atomicity, consistency, and durability?