19 Why is a Query for Just One Line Executed so Slowly #

In general, when I talk to you about query performance optimization, you would first think of complex statements or queries that need to return a large amount of data. However, there are also cases where querying only one line of code can still be executed very slowly. Today, let’s talk about this interesting topic and see under what circumstances this phenomenon may occur.

It should be noted that if the MySQL database itself is under heavy load, causing high CPU usage or high IO utilization of the database server, in this case, the execution of all statements may be slow, which is beyond the scope of our discussion today.

For illustrative purposes, I will create a table to demonstrate today’s problem. This table has two fields, id and c, and I will insert 100,000 rows of data into it.

mysql> CREATE TABLE `t` (
  `id` int(11) NOT NULL,
  `c` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

delimiter ;;
create procedure idata()
begin
  declare i int;
  set i=1;
  while(i<=100000)do
    insert into t values(i,i);
    set i=i+1;
  end while;
end;;
delimiter ;

call idata();

Next, I will provide a few different scenarios as examples, some of which are knowledge points we have already discussed in previous articles. See if you can figure them out at first glance and test your understanding.

Scenario 1: Query takes a long time to return #

As shown in Figure 1, execute the following SQL statement on the table t:

mysql> select * from t where id=1;

The query takes a long time to return.

Figure 1: Query takes a long time to return

In general, when encountering this situation, the table t is likely to be locked. Next, when analyzing the reasons, it is usually recommended to first execute the show processlist command to see the current state of the statement.

Then, based on each state, we will analyze the reasons behind them, how to reproduce them, and how to handle them.

Waiting for MDL Lock #

As shown in Figure 2, it demonstrates the status of “Waiting for table metadata lock” when using the “show processlist” command.

Figure 2: Status of Waiting for table metadata lock

This status indicates that there is a thread currently requesting or holding an MDL write lock on table t, which is blocking the select statement.

In the 6th article “Global Locks and Table Locks: Why Adding a Field to a Table is So Challenging?”, I introduced a method for reproducing this issue. However, it should be noted that the reproduction process is based on MySQL 5.6. The MDL locking strategy was modified in MySQL 5.7, so this scenario cannot be reproduced.

However, it is still easy to reproduce this scenario in MySQL 5.7. As shown in Figure 3, I have provided simple reproduction steps.

Figure 3: Reproduction steps for Waiting for table metadata lock in MySQL 5.7

Session A holds the MDL write lock on table t through the “lock table” command, while Session B’s query requires an MDL read lock. As a result, Session B enters a waiting state.

The way to handle this type of problem is to identify who is holding the MDL write lock and kill their connection.

However, it is inconvenient to find it out because the “Command” column of Session A appears as “Sleep” in the results of “show processlist”. Fortunately, with the introduction of performance_schema and sys system libraries, it has become much easier. (Note: MySQL needs to be started with setting performance_schema=on, as setting it to off will result in a performance loss of about 10%)

By querying the table sys.schema_table_lock_waits, we can directly find out the process id that causes the blocking, and then terminate this connection using the “kill” command.

Figure 4: Capturing the thread id that acquired the table lock

Waiting for table flush #

Next, let me give you another example of a situation where a query is blocked.

I executed the following SQL statement on table t in the query:

mysql> select * from information_schema.processlist where id=1;

Now, let me give you a hint.

You can refer to Figure 5. I found that the status of this thread is “Waiting for table flush.” Can you guess the reason?

Figure 5: Diagram illustrating the Waiting for table flush status

This status indicates that there is a thread currently performing a flush operation on table t. In MySQL, there are generally two ways to perform a flush operation on a table:

flush tables t with read lock;

flush tables with read lock;

These two flush statements, if specified with table t, mean only table t is closed; if no specific table name is specified, it means all open tables in MySQL are closed.

However, normally these two statements execute quickly, unless they are also blocked by other threads.

Therefore, the possible reason for the Waiting for table flush status is that a flush tables command is blocked by another statement and it, in turn, blocks our select statement.

Now, let’s try to reproduce this situation together. The steps to reproduce are shown in Figure 6:

Figure 6: Steps to reproduce Waiting for table flush

In session A, I intentionally call sleep(1) for each row, so this statement will take 100,000 seconds to execute by default, during which the table t will continuously be “open” by session A. Then, the flush tables t command in session B will need to wait for the query in session A to finish in order to close table t. As a result, when session C attempts to query again, it will be blocked by the flush command.

Figure 7 shows the result of running show processlist for this reproduction. The troubleshooting process for this example is also quite simple. Once you see the result of the show processlist, you will know what to do.

Figure 7: Result of show processlist for Waiting for table flush

InnoDB Row Locks #

Now, after the test of table-level locks, our select statement finally enters the engine.

mysql> select * from t where id=1 lock in share mode;

You are already familiar with the usage of this statement, as we mentioned it in the 8th article [“Is transaction really isolated?”] when discussing current reads.

Since acquiring a read lock is required when accessing the record with id=1, if there is already a transaction holding a write lock on this record, our select statement will be blocked.

Here are the reproduction steps and the scene:

Figure 8: Reproduction of row lock

Figure 9: Scene with row lock in show processlist

Obviously, session A starts a transaction, holds a write lock, and does not commit, which is the reason why session B is blocked.

This problem is not difficult to analyze, but the question is how to find out who is holding this write lock. If you are using MySQL 5.7, you can find it through the sys.innodb_lock_waits table.

The query method is:

mysql> select * from t sys.innodb_lock_waits where locked_table=`'test'.'t'`\G

Figure 10: Finding row lock through sys.innodb_lock_waits

As you can see, this information is comprehensive, and thread 4 is the culprit of the blockage. To eliminate this culprit, you can use either KILL QUERY 4 or KILL 4.

However, it should not display “KILL QUERY 4” here. This command stops the currently executing statement of thread 4, which is actually useless. Because the update statement holds the row lock, and this statement has already finished executing, executing KILL QUERY now cannot release the row lock on id=1.

In fact, KILL 4 is effective, which means disconnecting this connection directly. An implicit logic here is that when the connection is disconnected, the thread currently executing in this connection is automatically rolled back, releasing the row lock on id=1.

Class 2: Slow Queries #

After going through various “locks”, let’s take a look at some examples of slow queries.

First, let’s look at an SQL statement whose cause you must be familiar with:

mysql> select * from t where c=50000 limit 1;

Since there is no index on field c, this statement can only perform a sequential scan on the id primary key, so it needs to scan 50,000 rows.

As confirmation, you can take a look at the slow query log. Note that here, in order to log all statements to the slow log, I first executed set long_query_time=0 after connecting to set the time threshold for the slow query log to 0.

Figure 11: Slow log of scanning 50,000 rows

Rows_examined shows that 50,000 rows were scanned. You might say that it is not slow, as it only took 11.5 milliseconds to return, and we usually configure over 1 second as the threshold for slow queries in production. But remember: a bad query is not necessarily a slow query. In this example, there are only 100,000 records, but as the data volume increases, the execution time increases linearly.

It is easy to understand that a large number of rows need to be scanned, so the execution is slow.

But next, let’s look at a statement that only scans one row, but it executes very slowly.

As shown in Figure 12, this is the slow log of this example. It can be seen that the executed statement is:

mysql> select * from t where id=1；

Although only one row is scanned, the execution time is as long as 800 milliseconds.

Figure 12: Scanning one row but executing slowly

Isn’t it strange? Where did all this time go?

If I scroll down the screenshot of this slow log a bit further, you can see the next statement, select * from t where id=1 lock in share mode. It scans only one row as well, but the execution time is 0.2 milliseconds.

Figure 13: slow log of adding lock in share mode

Doesn’t it look even stranger now? In theory, adding lock in share mode should require more time because it involves acquiring locks. Some students may already have the answer. If you don’t have the answer yet, I’ll give you another hint. Figure 14 shows the execution output of these two statements.

Figure 14 Output of the two statements

The query result of the first statement has c=1, and the statement with “lock in share mode” returns c=1000001. By now, more students should know the reason. If you still don’t have any clue, don’t worry. Let me first explain the reproduction steps and then analyze the reason.

Figure 15 Reproduction steps

As you can see, session A starts a transaction with the “start transaction with consistent snapshot” command first, and then session B begins executing the update statement.

After session B executes the update statement one million times, what is the state of the row with id=1? You can find the answer in Figure 16.

Figure 16 Data state of id=1

After updating one million times, session B generates one million undo logs.

The SQL statement with “lock in share mode” is a current read, so it directly reads the result of 1000001, which makes it fast. On the other hand, the statement “select * from t where id=1” is a consistency read, so it needs to execute the undo log starting from 1000001, executing it one million times before returning the result of 1.

Note that the undo log actually records operations like “change 2 to 1” or “change 3 to 2.” Drawing it as decrement 1 is for easier visualization.

Summary #

Today, I showed you an example of a simple table where executing a “retrieve one row” operation may cause it to be locked and slow. This involves concepts of table locks, row locks, and consistency reads.

In actual usage, the scenarios may be more complex. However, the general approach I introduced in the article can help you locate and solve the problems.

Finally, I’ll leave you with a question.

In our example of using a lock read, we used this statement: “select * from t where id=1 lock in share mode.” Since there is an index on id, it can directly locate the row with id=1, and the read lock is only added to this row.

But if we have the following SQL statement:

begin;
select * from t where c=5 for update;
commit;

How is the locking performed in this sequence of statements? When are the locks released?

You can write down your opinion and verification methods in the comments section, and I’ll provide my reference answer at the end of the next article. Thank you for listening, and feel free to share this article with your friends.

Previous Question Time #

In the last article, I asked you to share any similar scenarios you encountered before.

@封建的风 mentioned an interesting scenario worth mentioning. Let me restate the question. The table structure is as follows:

mysql> CREATE TABLE `table_a` (
  `id` int(11) NOT NULL,
  `b` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `b` (`b`)
) ENGINE=InnoDB;

Let’s assume that the table has 1 million rows of data, and among them, 100,000 rows have the value ‘1234567890’ in column b. Suppose we execute the following statement:

mysql> select * from table_a where b='1234567890abcd';

How would MySQL execute this statement?

The ideal situation would be that MySQL recognizes that column b is defined as varchar(10) and returns an empty result. Unfortunately, MySQL doesn’t do that.

So, what if we directly use ‘1234567890abcd’ in the index lookup? We might expect that MySQL quickly determines that the value is not in the b index tree and returns an empty result.

But in reality, MySQL doesn’t do that either.

The execution of this SQL statement is slow and follows these steps:

When the statement is passed to the storage engine for execution, it truncates the value. Since the length of column b is defined as 10, only the first 10 characters (‘1234567890’) are used for matching.
There are 100,000 rows that satisfy this condition.
Since it is a select *, it needs to perform 100,000 index lookups.
However, each time it looks up, it retrieves the whole row. At the server layer, it checks that the value of b is not ‘1234567890abcd’.
The result is empty.

This example is a good supplement to our article. Although there might be additional function operations during execution, after obtaining the result, the server layer still needs to perform another round of verification.