29 How to Judge Whether a Database Has Problems #

In the 25th and 27th articles, I introduced the process of primary-secondary switch. From the explanations in those articles, you should already have a clear understanding: in a one-master-one-slave architecture, the primary-secondary switch only requires redirecting client traffic to the secondary database; while in a one-master-multiple-slave architecture, the primary-secondary switch not only requires redirecting client traffic to the secondary database, but also needs to connect the slaves to the new master.

There are two scenarios for the primary-secondary switch: active switch and passive switch. Among them, the passive switch is often initiated by the HA system when there is a problem with the primary database.

This leads us to the question we will discuss today: how do we determine if a primary database has a problem?

You might say that it’s simple, just connect to MySQL and execute a SELECT 1. But does a successful return of SELECT 1 mean that there is no problem with the primary database?

SELECT 1 method #

In reality, a successful return of SELECT 1 only indicates that the database process is still running, but it does not guarantee that there is no problem with the primary database. Now, let’s take a look at this scenario.

set global innodb_thread_concurrency=3;

CREATE TABLE t ( id int(11) NOT NULL, c int(11) DEFAULT NULL, PRIMARY KEY (id) ) ENGINE=InnoDB;

insert into t values(1,1)

Figure 1 Query blocked

The purpose of setting the innodb_thread_concurrency parameter is to control the maximum number of concurrent threads in InnoDB. In other words, once the number of concurrent threads reaches this value, InnoDB will enter a waiting state when receiving new requests, until a thread exits.

Here, I set innodb_thread_concurrency to 3, which means InnoDB only allows 3 threads to execute in parallel. In our example, the sleep(100) in the first three sessions causes these three statements to remain in the “executing” state, simulating a large query.

As you can see, in session D, SELECT 1 can be executed successfully, but the statement to query table t is blocked. This means that if we use SELECT 1 to detect if the instance is normal at this time, we won’t be able to detect the problem.

In InnoDB, the default value of the innodb_thread_concurrency parameter is 0, which means there is no limit on the number of concurrent threads. However, allowing an unlimited number of concurrent threads is not feasible. This is because the number of CPU cores in a machine is limited, and the cost of context switching will be too high if all threads rush in.

Therefore, in general, we recommend setting the innodb_thread_concurrency to a value between 64 and 128. At this point, you may have a question: what can be done with a maximum of 128 concurrent threads when the number of concurrent connections online easily reaches thousands?

The reason for this question is that concurrent connections and concurrent queries are not the same concept.

Concurrent connections and concurrent queries are not the same concept. The thousands of connections you see in the results of show processlist refer to concurrent connections. The “currently executing” statements are what we call concurrent queries.

Having thousands of concurrent connections doesn’t have much impact, it just takes up some memory. What we should focus on is concurrent queries because high concurrent queries are what put a strain on the CPU. This is also why we need to set the innodb_thread_concurrency parameter.

Then, you may also recall when we talked about hot updates and deadlock detection in the 7th article, if we set innodb_thread_concurrency to 128, would it quickly consume the entire 128 when the problem of hot updates on the same row occurs, causing the entire system to crash?

In reality, once a thread enters a lock wait, the count of concurrent threads decreases, which means threads waiting for row locks (including gap locks) are not counted within the 128.

MySQL is designed this way for a reason. This is because threads in lock wait state no longer consume CPU; and more importantly, it must be designed this way to avoid the entire system from deadlocking.

Why is that? Imagine if threads in lock wait state were also counted in the count of concurrent threads, you can imagine this scenario:

Thread 1 executes begin; update t set c=c+1 where id=1, starts the transaction trx1, and remains in this state. At this point, the thread is in an idle state and is not counted as a concurrent thread.
Threads 2 to 129 execute update t set c=c+1 where id=1; and enter the wait state due to a row lock. This means there are 128 threads in a wait state;
If the thread count in lock wait state is not decreased by one, InnoDB will consider that the thread count has reached its limit and will prevent other statements from entering the engine for execution. This means that thread 1 cannot commit the transaction. And the other 128 threads are also in lock wait state, causing the entire system to be blocked.

Figure 2 below shows this state.

Figure 2 System deadlock state (assuming row lock statements occupy concurrency count)

At this point, InnoDB cannot respond to any requests and the entire system is locked. Moreover, since all threads are in a waiting state, the CPU usage is 0, which is obviously unreasonable. Therefore, we say that when designing InnoDB, it is reasonable and necessary to reduce the concurrency thread count by 1 when a process enters a lock wait situation.

Although the thread in lock wait state is not counted in the concurrency thread count, if it is actually executing a query, such as the “select sleep(100) from t” statement in the example above, it still needs to be counted in the concurrency thread count.

In this example, the number of concurrently executing statements exceeds the value set for innodb_thread_concurrency, and the system is actually unable to function. However, by using “select 1” to check the system, it would be considered normal.

Therefore, we need to modify the logic of using “select 1” for judgment.

Table lookup judgment #

In order to detect the situation where the InnoDB concurrency thread count is too high and the system becomes unavailable, we need to find a scenario to access InnoDB. The general practice is to create a table in the system database (mysql database), named “health_check” for example, and put only one row of data in it. Then, regularly execute the following query:

mysql> select * from mysql.health_check;

Using this method, we can detect the situation where the database becomes unavailable due to excessive concurrent threads.

However, we will soon encounter another problem: this method becomes ineffective when the disk space is full.

We know that update transactions need to write to the binlog, and once the disk space usage of the binlog reaches 100%, all update statements and transaction commit statements will be blocked. However, the system can still read data normally at this time.

Therefore, let’s further improve this monitoring query. Next, let’s see the effect of changing the query to an update statement.

Update judgment #

Since we need to perform an update, we need to have a meaningful field in the table. The common practice is to include a timestamp field to indicate the time of the last execution of the check. This update statement is similar to:

mysql> update mysql.health_check set t_modified=now();

Availability checks should include both the master node and the slave node. If updates are used to check the master node, the slave node should also be updated.

However, the slave node also needs to write to the binlog for the check. Since we generally design the master-slave relationship between databases A and B as a dual M structure, the check command executed on slave database B also needs to be sent back to master database A.

But if both the master database A and the slave database B use the same update command, there may be row conflicts, which can lead to the synchronization between the master and slave stopping. Therefore, it seems that the mysql.health_check table cannot have only one row of data anymore.

To prevent conflicts between updates on the master and slave, we can store multiple rows of data in the mysql.health_check table and use the server_id of A and B as the primary key.

mysql> CREATE TABLE `health_check` (
  `id` int(11) NOT NULL,
  `t_modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

/* Health Check Command */
insert into mysql.health_check(id, t_modified) values (@@server_id, now()) on duplicate key update t_modified=now();

Since MySQL requires the server_id of the primary and standby databases to be different (otherwise an error will occur when creating the primary-standby relationship), this ensures that the health check commands of the primary and standby databases will not conflict.

Updating the judgment is a relatively common solution, but there are still some problems. Among them, “slow judgment” has always been a headache for DBAs.

You must be wondering, if the update statement fails or times out, we can initiate a primary-standby switch, so why is there still a slow judgment problem?

In fact, this involves the issue of IO resource allocation on the server.

Firstly, all the detection logic requires a timeout period N. If an update statement takes longer than N seconds to return, the system is deemed unavailable.

Imagine a scenario where the IO utilization of the log disk is already 100%. At this time, the entire system is responding very slowly and a primary-standby switch is required.

But you should know that an IO utilization of 100% means that the IO of the system is working, and each request has the opportunity to obtain IO resources to execute its own task. The update command used in our detection requires very few resources, so it is possible to successfully submit it when obtaining IO resources, and return to the detection system before the timeout period of N seconds is reached.

The detection system sees that the update command did not timeout, so it concludes that the system is “normal”.

In other words, at this time, the normal SQL statements on the business system are already executing slowly, but when the DBA checks, the HA system is still working fine and considers that the primary database is currently available.

The reason for this phenomenon is that all the methods mentioned above are based on external detection. External detection inherently has a problem of randomness.

Because external detection requires regular polling, it is possible that the system has already encountered a problem, but we can only discover it when the next detection initiates the execution of a SQL statement. And if you are not lucky enough, it may not be discovered even during the first polling, which leads to a slow switch problem.

Therefore, next I will introduce another method to identify database problems internally in MySQL.

Internal Statistics #

Regarding the issue of disk utilization, if MySQL can provide us with the time of each IO request internally, our method of determining whether the database has problems would be much more reliable.

In fact, starting from MySQL version 5.6, the performance_schema library provided by MySQL has statistics on the time of each IO request in the file_summary_by_event_name table.

There are many rows of data in the file_summary_by_event_name table, let’s start by looking at the row with event_name=‘wait/io/file/innodb/innodb_log_file’.

Figure 3: A row of performance_schema.file_summary_by_event_name

In the figure, this row represents the statistics of redo log writing time. The first column EVENT_NAME represents the type of statistics.

The next three sets of data display the time statistics of redo log operations.

The first set of five columns represents statistics for all types of IO. Among them, COUNT_STAR is the total number of IOs, and the next four columns represent specific statistics, with units in picoseconds; the prefixes SUM, MIN, AVG, MAX, as the names suggest, represent the sum, minimum value, average value, and maximum value respectively.

The second set of six columns represents statistics for read operations. The last column SUM_NUMBER_OF_BYTES_READ represents the total number of bytes read from the redo log.

The third set of six columns represents statistics for write operations.

The final fourth set of data represents statistics for other types of data. In the redo log, you can consider them as statistics for fsync.

In the file_summary_by_event_name table of the performance_schema library, the binlog corresponds to the row where event_name = “wait/io/file/sql/binlog”. The statistical logic for each field is exactly the same as that for the redo log fields. I will not elaborate on it here.

Because performance_schema needs to additionally collect and update this information for each database operation, enabling this statistics feature will incur some performance overhead. My test results show that if all performance_schema items are enabled, performance will decrease by about 10%. So, I recommend that you only enable the items you need for statistics. You can use the following method to enable or disable the statistics for a specific item.

To enable time monitoring for redo log, you can execute the following statement:

mysql> update setup_instruments set ENABLED='YES', Timed='YES' where name like '%wait/io/file/innodb/innodb_log_file%';

Assuming that you have already enabled statistics for redo log and binlog, how can you use this information for instance status diagnosis?

It’s simple. You can determine if the database has a problem by looking at the value of MAX_TIMER. For example, you can set a threshold where a single IO request that takes more than 200 milliseconds is considered abnormal. Then you can use a query like the following as the detection logic:

mysql> select event_name, MAX_TIMER_WAIT FROM performance_schema.file_summary_by_event_name where event_name in ('wait/io/file/innodb/innodb_log_file', 'wait/io/file/sql/binlog') and MAX_TIMER_WAIT > 200 * 1000000000;

After identifying the exception, you can retrieve the information you need and then clear the statistics using the following statement:

mysql> truncate table performance_schema.file_summary_by_event_name;

This way, if the same exception occurs in the subsequent monitoring, it can be accumulated in the monitoring value.

Summary #

Today, I introduced several methods for detecting the health status of a MySQL instance, as well as the problems and logic of each method’s evolution.

After reading this, you may wonder if methods like select 1 have already been phased out. However, the widely used Master High Availability (MHA) framework actually defaults to using this method.

Another optional method in MHA is to only establish a connection, meaning “if the connection is successful, assume there is no problem with the master database.” However, as far as I know, very few choose this method.

In fact, each improved solution will add additional overhead and cannot be judged as “right or wrong” directly. You need to consider your specific business situation and make a trade-off.

The solution I personally prefer is to prioritize updating system tables and then add detection of performance_schema information.

Finally, it’s time for our thought-provoking question.

Today, I want to ask you: Business systems generally also have high availability requirements. In the services you have developed and maintained, how do you determine if a service has a problem?

Write down the methods and analysis you have used in the comments section, and I will select interesting solutions to share and analyze in the next article. Thank you for listening, and feel free to share this article with more friends to read together.

Last Issue Question #

The previous question was about what happens when using GTID and other point-in-time schemes for read-write separation when performing DDL on a large table.

Assuming that this statement takes 10 minutes to execute on the master database and another 10 minutes to replicate to the backup database (typical large transaction). So, when querying for the GTID of a transaction that was committed after the master database’s DDL, it will take 10 minutes for it to appear in the backup database.

As a result, the read-write separation mechanism will time out during this 10-minute period, and the query will fall back to the master database.

To handle such expected operations, they should be performed during off-peak hours to ensure that the master database can handle all business queries, and then switch all read requests to the master database before performing the DDL on the master database. Once the backup database catches up, the read requests can be redirected back to the backup database.

With this thought-provoking question, my main intention is to draw attention to the impact of large transactions on point-in-time schemes.

Of course, using the gh-ost solution to solve this problem is also a good choice.