35 Remember an Online SQL Deadlock Incident How to Avoid Deadlocks

35 Remember an online SQL deadlock incident How to avoid deadlocks #

Hi, I’m Liu Chao. Today, let’s talk about deadlocks. Before we begin, let me share a little story with you. I believe you may have encountered this situation before or gain some inspiration from it.

I have participated in a project in which we did not separate the read and write tables initially. Instead, we performed read and write operations on a single master database. As the business volume gradually increased, we occasionally received system alarms, and the DBA notified us that there were deadlock exceptions in the database.

In theory, the business was relatively simple at the beginning, such as adding orders, modifying orders, and querying orders. So why did deadlocks occur? After analyzing the logs, we found that one specific table used for idempotent checks often experienced deadlock exceptions. After discussing with the DBA, we initially suspected that the deadlock issue was caused by the indexes. Later, we simulated the relevant operations in the development environment and indeed replicated the deadlock exception.

Next, let’s reproduce this business deadlock exception through practical operation. First, create a table for order records, which is mainly used for checking duplicate order creation:

CREATE TABLE `order_record`  (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `order_no` int(11) DEFAULT NULL,
  `status` int(4) DEFAULT NULL,
  `create_date` datetime(0) DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  INDEX `idx_order_status`(`order_no`,`status`) USING BTREE
) ENGINE = InnoDB

To be able to reproduce this problem, let’s first set the transaction to manual commit. Please note that the way MySQL and Oracle commit transactions is slightly different. By default, MySQL automatically commits transactions. You can check whether auto-commit is enabled with the following command line:

mysql> show variables like 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+
1 row in set (0.01 sec)

Next, let’s proceed with the operation. First, set the MySQL database’s transaction commit to manual mode. You can disable the automatic commit feature with the following command line:

mysql> set autocommit = 0;
Query OK, 0 rows affected (0.00 sec)

When performing idempotent checks on orders, we first check whether the order exists based on the order number. If it does not exist, we then add an order record. Now that we know the specific logic, let’s simulate the SQL statements that can cause deadlocks. We are going to simulate creating two new orders and execute the idempotent check SQL statements in the following order (vertically represents the execution time sequence):

At this point, we will find that the two transactions are in a deadlock state. We can query the specific deadlock situation in the information_schema database, as shown in the following image:

Now that you see this, you may wonder why we use an exclusive lock (for update) in the SELECT statement instead of a shared lock. Imagine that if two requests with the same order number come in simultaneously, phantom reads may occur. In other words, initially, transaction A did not have the order number in its query, but later transaction B added a record with that order number. At this point, transaction A will add another record with the same order number, creating duplicate order records. To prevent this situation, we can use the gap lock algorithm to prevent phantom reads.

How is a deadlock formed? #

Earlier, we mentioned about lock gaps. In [Lesson 33], I have discussed the lock mechanism and the specific implementation algorithms for row locks in concurrent transactions. Let’s review it.

There are three specific implementation algorithms for row locks: record lock, gap lock, and next-key lock. Record lock is specifically for locking index entries; gap lock is for locking the gap between index entries; and next-key lock is a combination of the first two, locking both index entries and the gaps between them.

Gap lock and next-key lock are obtained only in specific operations under the repeatable read or higher isolation level. When performing Select, Update, and Delete operations, except for queries based on unique indexes, other index queries will acquire gap lock or next-key lock, meaning that they lock the range of the scan. Primary key indexes are unique indexes, so gap lock and next-key lock are not used for primary key indexes.

In MySQL, gap lock is enabled by default. This means that the innodb_locks_unsafe_for_binlog parameter is set to disable, and the default transaction isolation level in MySQL is RR.

When executing the following SQL query, since the order_no column is not a unique index and the transaction isolation level is RR, the locking type for SELECT is gap lock. The gap range here is (4,+∞).

SELECT id FROM demo.order_record where order_no = 4 for update;

Acquiring the gap lock for the executed query does not cause any blocking. However, when we execute the following insert SQL, an insert intention lock is obtained on the insert gap. An insert intention lock is actually a type of gap lock that conflicts with other gap locks. Therefore, if another transaction holds the gap lock for that gap, it needs to wait for the other transaction to release the gap lock before it can obtain the insert intention lock.

Both transaction A and transaction B hold the gap lock for the gap (4,+∞), and the insert operations that follow are waiting for each other’s gap lock to be released in order to obtain the insert intention lock. This results in a circular wait, causing a deadlock.

INSERT INTO demo.order_record(order_no, status, create_date) VALUES (5, 1, ‘2019-07-13 10:57:03’);

We can refer to the following lock compatibility matrix to understand the compatibility of different locks:

Measures to Avoid Deadlocks #

Once you understand the source of the deadlock problem, you can find appropriate methods to avoid it.

The most straightforward method to avoid deadlocks is to roll back a transaction if its waiting time exceeds a certain threshold set. This allows the other transaction to continue. This method is simple and effective. In InnoDB, the innodb_lock_wait_timeout parameter is used to set the timeout period.

Additionally, we can set the order_no column as a unique index column. Although it cannot prevent phantom reads, we can use its uniqueness to ensure that duplicate order records are not created. The only drawback of this approach is that an exception will be thrown when duplicate orders are encountered.

We can also use other methods to implement idempotent checks instead of relying on the database. For example, using Redis or ZooKeeper can provide better runtime efficiency compared to the database.

Other Common SQL Deadlock Issues #

Here are some additional common SQL deadlock issues so that you can understand the reasons behind them and resolve them smoothly when you encounter them.

We know that there are four necessary conditions for a deadlock: mutual exclusion, hold and wait, no preemption, and circular wait. Whenever a deadlock occurs, these conditions must be satisfied. Therefore, in business scenarios where mutual exclusion is frequently used to share resources and circular waits are possible, special attention should be paid to deadlock issues.

Next, let’s learn about a scenario where deadlocks occur.

As we mentioned before, the primary key index of the InnoDB storage engine is a clustered index, while other indexes are secondary indexes. If you use a secondary index to update the database, you need to use the clustered index to update the database fields. If two update transactions use different secondary indexes or if one uses a secondary index while the other uses a clustered index, they may both lead to circular waits for lock resources. Since the two transactions are mutually exclusive, they meet the four necessary conditions for deadlocks.

Let’s reproduce the deadlock issue caused by circular waits for lock resources when updating the clustered index and secondary index using the example of the order record table:

The steps to reproduce the deadlock are as follows:

In summary, when performing update operations, it is best to use the primary key to update table fields. This can effectively avoid unnecessary deadlock occurrences.

Summary #

The probability of database deadlock is not very high, but once it occurs, we must thoroughly investigate the specific causes and find solutions as soon as possible. To be honest, the process is not simple. We can only analyze the specific causes of deadlocks by first having a solid understanding of the InnoDB storage engine in MySQL.

For example, the two scenarios I mentioned above test our understanding of lock algorithms and our familiarity with clustered and secondary indexes.

The best way to solve deadlocks is, of course, to prevent them from happening in the first place. In our everyday programming, we can use the following methods to prevent deadlocks:

Process database records in a fixed order as much as possible. For example, if there are two update operations that update the same records but in different orders, it may cause deadlocks.
Use the Read Committed (RC) transaction isolation level whenever possible, allowing phantom reads and non-repeatable reads. This can avoid deadlock problems caused by gap locks.
When updating tables, use the primary key to update as much as possible.
Avoid long transactions and try to break them down into smaller parts. This can reduce the chance of conflicts with other transactions.
Set the lock wait timeout parameter. We can set a reasonable timeout threshold through innodb_lock_wait_timeout, especially in high-concurrency businesses. We can try to set this value to a smaller one to avoid a large number of transactions waiting, occupying system resources, and causing serious performance overhead.

Thought Exercise #

Besides setting the innodb_lock_wait_timeout parameter to avoid long waits for SQL queries resulting in deadlock, do you know any other methods to solve similar problems?