12 Why Does My MySQL Flutter Sometimes #

During your work, have you ever encountered a situation where a SQL statement executes very quickly but sometimes inexplicably becomes slow? And such a scenario is difficult to reproduce, as it is not only random but also of short duration.

It seems like the database is “shaking”. Today, let’s take a look at the reasons behind this.

Why did your SQL statement become “slow”? #

In my previous article [《Log System: How Does an SQL Update Statement Execute?》], I introduced the WAL mechanism to you. Now you know that when InnoDB processes an update statement, it only performs one disk operation: writing the log. This log is called the redo log, which is similar to the slate used by Xianheng Hotel’s shopkeeper in “Kong Yiji” to keep accounts. After writing the update in memory to the redo log, it returns to the client, indicating a successful update.

To put it in analogy, the shopkeeper needs to find time to update the account book, which corresponds to the process of writing the data in memory to disk. This process is called flush. Before this flush operation is performed, the outstanding amount owed by Kong Yiji is actually inconsistent between the memory data pages and the disk data pages. This is because today’s credit amount for Kong Yiji only exists in the slate but has not been accounted for in the account book.

When the content of a memory data page and a disk data page are inconsistent, we refer to this memory page as a “dirty page”. After the memory data is written to disk, the content of the memory and disk pages becomes consistent, and we call them “clean pages”.

Whether it is a dirty page or a clean page, they both exist in memory. In this example, memory corresponds to the shopkeeper’s memory.

Next, let’s use a diagram to illustrate the entire process of “Kong Yiji’s credit update”. Let’s assume that Kong Yiji originally owed 10 coins and this time he wants to owe 9 more coins.

Figure 1 “Kong Yiji’s Credit Update” Process and Flush

Going back to the question at the beginning of the article, you can imagine that the fast execution of the update operation is actually writing to memory and the log. The moment when MySQL occasionally “shakes” is likely when dirty pages are being flushed.

So, under what circumstances does the flush process of the database occur?

Let’s continue using the example of Xianheng Hotel’s shopkeeper. Think about it: under what circumstances would the shopkeeper transfer the credit records from the slate to the account book?

The first scenario is when the slate is full and can’t fit more records. At this time, if someone comes to make a transaction, the shopkeeper has to put down what he is doing and erase some records from the slate to make room for new ones. Of course, before erasing them, he must first record the correct account in the account book. This scenario corresponds to when the redo log of InnoDB is full. At this point, the system will halt all update operations, move the checkpoint forward, and make space for continued log writing. In my second article, I drew a diagram of the redo log, here I made it circular to make it easier for everyone to understand.

Figure 2 Redo Log Status Diagram

Moving the checkpoint forward is not as simple as modifying its position. For example, in Figure 2, if the checkpoint position is moved from CP to CP’, all dirty pages corresponding to the logs between these two points (indicated by the light green area) need to be flushed to disk. After that, the area between the write pos and CP’ in the figure becomes the region in which new logs can be written.

The second scenario is when business is booming for the day and there are too many things to remember. The shopkeeper realizes that he can no longer remember them all, so he quickly takes out the account book and records Kong Yiji’s transaction first. This scenario corresponds to insufficient system memory. When new memory pages are needed, but there is not enough memory available, some data pages need to be evicted to free up memory for other data pages. If evicted pages are “dirty pages”, they need to be written to disk first. You may wonder, why can’t the memory be simply evicted and when needed again, read the data page from disk and apply the redo log? This is actually a performance consideration. If dirty pages are guaranteed to be written to disk during flush, it ensures that each data page can have two states:
- One is when the data exists in memory, and in this case, the memory always contains the correct result, so it can be directly returned;
- The other is when there is no data in memory, and in this case, the data page on the disk is guaranteed to contain the correct result, so after it is read into memory, it can be returned. This provides the highest efficiency. In English:
The third scenario is when the business is not busy or after closing. At this time, the counter is idle, and the shopkeeper is also idle, so why not update the books. In this scenario, it corresponds to the time when MySQL considers the system to be “idle”. Of course, when the business of “this hotel” gets better, the shopkeeper can quickly fill up the billboards, so the shopkeeper needs to schedule time reasonably, even when the business is good, try to find time to clean up some “dirty pages” whenever there is an opportunity.
The fourth scenario is when it’s the end of the year and the Xian Heng Hotel needs to close for a few days and settle the accounts. At this time, the shopkeeper needs to record all the accounts in the book, so that when the hotel reopens after the new year, the account status can be clearly understood based on the book. This scenario corresponds to the normal shutdown of MySQL. At this time, MySQL flushes all dirty pages in memory to disk, so that the next time MySQL starts, it can read data directly from the disk, which will make the start-up faster.

Next, you can analyze the impact of the above four scenarios on performance.

Among them, the third scenario involves operations when MySQL is idle, and the system has no pressure. The fourth scenario is when the database needs to be closed anyway. In these two scenarios, you don’t pay too much attention to “performance” issues. So here, let’s mainly analyze the performance issues in the first two scenarios.

The first scenario is when the “redo log is full and dirty pages need to be flushed”. This is something InnoDB tries to avoid. Because when this happens, the whole system can no longer accept updates, and all updates are blocked. If you look at the monitoring, the number of updates will drop to 0.

The second scenario is when “there is not enough memory and dirty pages need to be written to disk”. This situation is actually normal. InnoDB manages memory using a buffer pool, and the memory pages in the buffer pool have three states:

The first is unused.
The second is used and clean.
The third is used and dirty.

InnoDB’s strategy is to use memory as much as possible, so for a long-running database, there are very few unused pages.

When the data page to be read is not in memory, a data page must be requested from the buffer pool. At this time, the oldest unused data page in memory must be evicted: if the page being evicted is clean, it will be released directly for reuse; but if it is dirty, the dirty page must be flushed to disk, become clean, and then can be reused.

So, although flushing dirty pages is normal, the following two scenarios will significantly affect performance:

If a query needs to evict too many dirty pages, the response time of the query will become significantly longer.
The log is full, all updates are blocked, and the write performance drops to 0. For sensitive businesses, this situation is unacceptable.

Therefore, InnoDB needs a mechanism to control the proportion of dirty pages to avoid the above two scenarios as much as possible.

InnoDB Dirty Page Control Strategy #

Next, let me tell you about InnoDB’s dirty page control strategy and the parameters related to these strategies.

First, you need to correctly inform InnoDB about the IO capabilities of the host it is on so that InnoDB knows how fast it can flush dirty pages when necessary.

This requires the use of the innodb_io_capacity parameter, which tells InnoDB about your disk capacity. I recommend setting this value to the IOPS of your disk. You can test the IOPS of your disk using the fio tool. The following command is the command I used to test random read and write of the disk:

fio -filename=$filename -direct=1 -iodepth 1 -thread -rw=randrw -ioengine=psync -bs=16k -size=500M -numjobs=10 -runtime=10 -group_reporting -name=mytest

Actually, there are many performance issues caused by incorrect settings of the innodb_io_capacity parameter. Previously, the development manager of another company asked me to look into a performance problem with a database. They said that the write speed of MySQL was very slow and the TPS (Transactions Per Second) was low, but the IO pressure on the database host was not high. After some investigation, I found that the culprit was the setting of this parameter.

The host’s disk was an SSD, but the value of innodb_io_capacity was set to 300. As a result, InnoDB believed that the system’s capacity was very poor, so it flushed dirty pages very slowly, even slower than the speed of generating dirty pages. This resulted in an accumulation of dirty pages, which affected the performance of queries and updates.

Although we have now defined the behavior of “full throttle flushing”, we can’t always flush at full speed during normal operation, right? After all, disk capacity should not be used only for flushing dirty pages; it also needs to serve user requests. So next, let’s see how InnoDB controls the flushing of dirty pages according to the percentage of “full throttle”.

Based on the knowledge I mentioned earlier, think about this: If you were to design a strategy to control the speed of flushing dirty pages, what factors would you consider?

Think about the consequences if the flushing is too slow. First, there will be too many dirty pages in memory, and second, the redo log will become full.

Therefore, InnoDB’s flushing speed needs to consider these two factors: the dirty page ratio and the speed of redo log writes.

InnoDB calculates two separate numbers based on these two factors.

The parameter innodb_max_dirty_pages_pct is the upper limit of the dirty page ratio, with a default value of 75%. InnoDB calculates a number between 0 and 100 based on the current dirty page ratio (let’s assume it is M). The pseudo code for this calculation is as follows:

F1(M)
{
  if M >= innodb_max_dirty_pages_pct then
      return 100;
  return 100 * M / innodb_max_dirty_pages_pct;
}

Each write operation in InnoDB has a serial number. Let’s assume the current serial number is N, and it represents the difference between the current write operation and the corresponding checkpoint. InnoDB calculates a number between 0 and 100 based on this N, and this calculation can be represented as F2(N). The F2(N) algorithm is quite complex, but you just need to know that the larger the N, the larger the calculated value.

Then, based on the values calculated by F1(M) and F2(N) above, the engine takes the larger value and defines it as R. The engine will then control the speed of flushing dirty pages by multiplying the capacity defined by innodb_io_capacity by R%.

The calculation process above may be a bit abstract and difficult to understand, so I have created a simple flow chart. The F1 and F2 in the chart represent the two values we calculated based on the dirty page ratio and the redo log write speed.

Figure 3: InnoDB dirty page flushing speed strategy

Now you know that InnoDB flushes dirty pages in the background, and the process involves writing memory pages to disk. So, whether your query statement needs memory and may need to evict a dirty page, or the logic of flushing dirty pages consumes IO resources and may affect your update statements, they may be the reasons why you perceive that MySQL “stutters” from the business side.

To avoid this situation as much as possible, you need to set the value of innodb_io_capacity reasonably and pay attention to the dirty page ratio, and do not let it frequently approach 75%.

The dirty page ratio can be obtained through Innodb_buffer_pool_pages_dirty/Innodb_buffer_pool_pages_total. Please refer to the following code for the specific command: mysql> select VARIABLE_VALUE into @a from global_status where VARIABLE_NAME = ‘Innodb_buffer_pool_pages_dirty’; select VARIABLE_VALUE into @b from global_status where VARIABLE_NAME = ‘Innodb_buffer_pool_pages_total’; select @a/@b;

Next, let’s look at an interesting strategy.

When a query needs to flush a dirty page during execution, it may become slower than usual. And there is a mechanism in MySQL that can make your query even slower: when preparing to flush a dirty page, if the neighboring page happens to be a dirty page as well, it will be flushed together with the original page. This chaining logic can continue to spread, which means that for each neighboring page, if the page next to it is also a dirty page, it will be flushed together.

In InnoDB, the innodb_flush_neighbors parameter controls this behavior. When its value is 1, the aforementioned chaining mechanism is enabled, and when its value is 0, it means that neighboring pages are not considered and each page is flushed individually.

This optimization of flushing neighboring pages was meaningful in the era of mechanical hard drives, as it reduced random IO. The random IOPS of mechanical hard drives are usually only a few hundred, so reducing random IO for the same logical operation meant a significant improvement in system performance.

However, if you are using high-IOPS devices like SSDs, I recommend setting the value of innodb_flush_neighbors to 0. This is because IOPS is often not the bottleneck in this case, and flushing only the individual page itself allows for faster completion of necessary dirty page operations, reducing SQL statement response time.

In MySQL 8.0, the default value of innodb_flush_neighbors parameter is already 0.

Summary #

In today’s article, I continued to explain the concept of the write-ahead log (WAL) introduced in the second article, and I explained the subsequent dirty page flushing operations and their timing. By using the WAL technique, the database converts random writes into sequential writes, greatly improving database performance.

However, this also brings up the issue of dirty pages in memory. Dirty pages are automatically flushed by background threads, and they can also be flushed due to data page eviction. The process of flushing dirty pages may consume resources, which can lead to longer response times for your updates and queries. In the article, I also introduced methods to control flushing dirty pages and corresponding monitoring methods.

At the end of the article, I will leave you with a question to think about.

For a large instance with a memory configuration of 128GB and innodb_io_capacity set to 20000, it is recommended to set up redo logs as 4 files of 1GB each.

But what would happen if, during the configuration, you mistakenly set up only one redo log file of 100MB? And why would such a situation occur?

You can write your analysis and conclusions in the comments section, and I will discuss this question with you at the end of the next article. Thank you for listening, and feel free to share this article with more friends to read together.

Previous Question #

In the previous article, I asked you about the methods to create an index for a student ID field.

Due to the rules of the student ID, both forward and backward prefix indexes have a relatively high repetition rate. Since it is only for one school, the first 6 digits (where the first three digits are the city code and the fourth to sixth digits are the school code) are fixed, and the email suffix is always @gmail.com. Therefore, it is possible to store only the year of enrollment and the sequential number, which have a length of 9 digits.

Moreover, it is possible to store these 9-digit numbers using a numeric type. For example, 201100001 can be stored in just 4 bytes. In fact, this is a form of hashing, using the simplest conversion rule: converting strings to numbers. And in the scenario we set up, it guarantees the uniqueness of the converted result.

In the comments section, there were also other insightful perspectives.

User @封建的风 mentioned that for the number of students in a school, which is a relatively small data volume, with only 1 million students over 50 years, it is considered a small table. For simplicity in the business, storing the original string directly is recommended. This reply also includes the idea of “optimization costs and benefits,” which I think is worth mentioning.