14 Count is so Slow, What Should I Do #

When developing a system, you may often need to calculate the number of rows in a table, such as the total number of change records in a transaction system. You might think that a simple select count(*) from t statement would solve the problem.

However, you will find that as the number of records in the system increases, this statement becomes slower and slower. Then you might think, why is MySQL so inefficient? Why not just store the total count and retrieve it directly whenever needed?

So today, let’s talk about how the count(*) statement is implemented, and why MySQL does it this way. Then, I’ll discuss how to handle frequent changes and the need to count table rows in the business design.

Implementation of count(*) #

First and foremost, you need to be aware that the count(*) function has different implementations in different MySQL engines.

In the MyISAM engine, the total number of rows in a table is stored on the disk, so executing count(*) will directly return this number, making it very efficient.
However, it’s more complicated in the InnoDB engine. When executing count(*), it needs to read the data row by row from the engine and then accumulate the count.

It’s important to note that we are discussing count(*) without any filtering conditions in this article. If you add a where clause, even MyISAM tables cannot return the count quickly.

In a previous article, we analyzed why we use InnoDB because it is superior to MyISAM in terms of transaction support, concurrency capability, and data security. I assume that your table is also using the InnoDB engine. This is why calculating the total number of rows in a table becomes slower as the record count increases.

So why doesn’t InnoDB store the count like MyISAM does?

This is because, even for multiple queries at the same moment, due to multi-version concurrency control (MVCC), it is uncertain how many rows InnoDB should return. Let me explain this with an example of calculating count(*).

Assuming there are currently 10,000 records in table t, let’s design three parallel sessions of users:

Session A starts a transaction and queries the total number of rows in the table.
Session B starts a transaction, inserts a record, and then queries the total number of rows in the table.
Session C starts a separate statement, inserts a record, and then queries the total number of rows in the table.

Let’s assume that the execution happens from top to bottom based on the time sequence, and each statement is executed at the same time.

Figure 1 - Execution flow of sessions A, B, and C

You will notice that at the last moment, when sessions A, B, and C simultaneously query the total number of rows in table t, they get different results.

This is related to InnoDB’s transaction design. Repeatable read is its default isolation level, implemented through MVCC. Each record needs to determine whether it is visible to the session, so for a count(*) request, InnoDB has no choice but to read the data row by row and calculate the total number of rows based on the visible rows “according to this query.”

Note: If your memory is fuzzy about MVCC, you can review the relevant content in the third article [“Transaction Isolation: Why Can’t I See the Changes You Made?”] and the eighth article [“Are Transactions Really Isolated or Not?”].

Of course, MySQL, which may seem slow in this regard, has still optimized the execution of count(*).

As you know, InnoDB is an index-organized table. The leaf nodes of the primary key index tree are data, while the leaf nodes of the non-primary key index tree are primary key values. Hence, the non-primary key index tree is much smaller than the primary key index tree. For operations like count(*), logically, it doesn’t matter which index tree is traversed to obtain the results. Therefore, the MySQL optimizer will find the smallest tree to traverse. Reducing the amount of scanned data while ensuring logical correctness is a general principle in database system design.

If you’ve used the show table status command, you might have noticed that the output includes a TABLE_ROWS column to display the current number of rows in the table. This command executes quite quickly, so can TABLE_ROWS replace count(*)?

You might remember from the tenth article [“Why Does MySQL Sometimes Choose the Wrong Index?”] that index statistics are estimated through sampling. In reality, TABLE_ROWS is also obtained from this sampling estimation, so it’s quite inaccurate. According to the official documentation, the error margin can reach 40% to 50%. Therefore, the number of rows shown by the show table status command cannot be directly relied upon. Let’s summarize here:

Although MyISAM tables are fast for count(*), they do not support transactions.
Although the show table status command returns quickly, it is not accurate.
Count(*) directly on InnoDB tables will traverse the entire table, resulting in performance issues, even though the result is accurate.

So, going back to the question at the beginning of the article, if you frequently need to display the total number of operation records in a transaction system, what should you do? The answer is that you can only count them yourself.

Next, let’s discuss the methods of counting on your own, as well as the advantages and disadvantages of each method.

Here, let me explain the basic idea of these methods to you: you need to find a place to store the number of records in the operation record table.

Using a caching system to store the count #

For databases with frequent updates, you might immediately think of using a caching system to support it.

You can use a Redis service to store the total number of rows in the table. For every inserted row in the table, the Redis count is increased by 1, and for every deleted row, the Redis count is decreased by 1. With this approach, both read and update operations are fast. But have you thought about what problems this approach might have?

Yes, the caching system may lose updates.

Redis data cannot be permanently stored in memory, so you will need to find a way to periodically persist this value. But even with this, there is still a possibility of losing updates. For example, imagine that a row is just inserted into the data table, the value stored in Redis is increased by 1, and then Redis experiences an unexpected restart. After the restart, you need to read this value back from the storage location, but the count operation that just increased by 1 is lost.

Of course, there is a solution to this. For example, after a Redis restart, you can perform a count(*) query on the database to get the accurate row count, and then write this value back to Redis. Since unexpected restarts are not frequent occurrences, the cost of a full table scan for this one time is acceptable.

However, in reality, storing the count in a caching system has more issues than just losing updates. Even when Redis is working normally, this value is still logically inaccurate.

Imagine a scenario where a page needs to display both the total number of operation records and the 100 most recent records. In this case, the logic of the page requires retrieving the count from Redis and retrieving the data records from the data table.

Here is how we define logical inaccuracy:

In one case, the 100 rows retrieved may include the latest inserted record, while the Redis count has not been increased by 1.
In another case, the 100 rows retrieved may not include the latest inserted record, while the Redis count has already been increased by 1.

Both of these situations indicate logical inconsistency.

Let’s take a look at this sequence diagram together.

Figure 2 Sequence diagram of sessions A and B

In Figure 2, Session A is the logic for inserting transaction records. It inserts a row R into the data table and increases the Redis count by 1. Session B represents the data retrieval for the page display.

In this sequence in Figure 2, at time T3, when session B comes to retrieve the data, it displays the newly inserted record R, but the Redis count has not been increased by 1 yet. This is when data inconsistency occurs.

You may say that this is because the update logic is writing to the data table first and then updating the Redis count, while the read logic reads Redis first and then reads the data table. The order of these operations is reversed. So, if we keep the same order, will the problem be solved? Let’s change the order of the update logic in session A and see the result.

Figure 3. Sequence diagram of execution order after rearrangement for sessions A and B

You will notice that at this point, when session B queries at time T3, the Redis count has increased by 1, but it still cannot find the newly inserted row “R”. This is also a case of data inconsistency.

In concurrent systems, we cannot precisely control the execution timing of different threads. Because of the operation sequence shown in the figure, even if Redis is working properly, this count value is logically inaccurate.

Storing the Count in the Database #

Based on the analysis above, using a cache system to store the count results in data loss and imprecise counting. So, what if we directly put this count value into a separate count table C in the database?

First, this solves the problem of data loss in case of a crash. InnoDB supports crash recovery without losing data.

Note: Regarding crash recovery in InnoDB, you can review the relevant content in the second article [“Log System: How is an SQL Update Statement Executed?”]

Next, let’s see if this can solve the problem of imprecise counting.

You might say, isn’t this the same? It’s just changing the operations on Redis in Figure 3 to operations on count table C. As long as there is this execution sequence shown in Figure 3, the problem remains unsolved, right?

But this problem can actually be solved.

The problem we are trying to solve in this article is due to the fact that InnoDB needs to support transactions, which means InnoDB tables cannot simply store the count(*) and return it when queried.

So, using the old adage “offense is the best defense,” we will now use the “transaction” feature to solve the problem.

Figure 4. Sequence diagram of execution for sessions A and B

Let’s see the current execution result. Although the read operation of session B still occurs at time T3, the update transaction has not been committed yet, so the operation of incrementing the count by 1 is still invisible to session B.

Therefore, the results seen by session B for the count value and the “latest 100 records” are logically consistent.

Different Usages of count #

In the comments section of the previous articles, a reader asked about the performance differences of different usages in the query statement “select count(?) from t”. They asked about the performance differences between count(), count(primary key id), count(field), and count(1). Today, as we discussed the performance issue with count(), I will take this opportunity to explain in detail the performance differences between these usages.

Please note that the following discussion is based on the InnoDB engine.

First, you need to understand the semantics of count(). count() is an aggregation function that examines each row in the returned result set. If the argument of the count function is not NULL, it increments the accumulator value by 1; otherwise, it does not increment. It finally returns the accumulated value.

Therefore, count(*), count(primary key id), and count(1) all represent the total number of rows in the result set that satisfy the condition; whereas count(field) represents the total number of rows in the result set that satisfy the condition where the parameter “field” is not NULL.

When analyzing the performance differences, you can remember the following principles:

Give the server layer what it wants;
InnoDB only provides necessary values;
The current optimizer only optimizes the semantics of count(*) as “get row count”; other “obvious” optimizations are not performed.

What does this mean? Let’s take a closer look at each case.

For count(primary key id), the InnoDB engine traverses the entire table, retrieves the id value for each row, and returns it to the server layer. The server layer then takes the id and, since it cannot be NULL, increments the count for each row. For count(1), the InnoDB engine traverses the entire table but does not retrieve any values. For each row returned by the server layer, a number “1” is placed, and it is impossible for it to be empty. The count is accumulated row by row.

If we compare the difference between these two usages, we can see that count(1) is faster than count(primary key id). This is because returning the id from the engine involves parsing the data row and copying the field values.

For count(field), there are two scenarios:

If the “field” is defined as not null, the field is read from each record one by one, and it is checked that it cannot be null. The count is accumulated row by row.
If the “field” allows null values, when executing, it is checked that it may be null, and the value is retrieved and checked again before counting if it is not null.

In other words, according to the first principle mentioned earlier, the server layer returns the fields that it needs, and InnoDB returns those fields.

However, count() is an exception and it doesn’t retrieve all fields. It is optimized not to retrieve values. count() is definitely not null, and the count is accumulated row by row.

You may say, can’t the optimizer judge on its own that the primary key id is definitely not null? Why can’t it handle it like count(*), which is such a simple optimization?

Of course, MySQL has optimized this statement specifically, so it is not impossible. But there are too many cases that require specific optimization, and MySQL has already optimized count(*), so you can directly use this syntax.

So the conclusion is: in terms of efficiency, count(field) is slower, so I recommend using count() as much as possible.

Summary #

Today, I talked to you about two methods in MySQL to get the number of rows in a table. We mentioned that count(*) is implemented differently in different engines, and analyzed the issues with storing the count value in a caching system.

InnoDB engine supports transactions, and we can make use of the atomicity and isolation of transactions to simplify business logic during development. This is one of the reasons why InnoDB engine is popular.

Finally, it’s time for the reflection question.

In the solution we just discussed, we used transactions to ensure accuracy of the count. Since transactions can ensure that intermediate results are not read by other transactions, the order of modifying the count value and inserting new records does not affect the logical result. However, from the perspective of concurrent system performance, do you think the insert operation should be done first or the count table should be updated first in this transaction sequence?

You can write your thoughts and opinions in the comments section, and I will provide my reference answer at the end of the next article. Thank you for listening, and feel free to share this article with more friends to read.

Previous question #

The previous question I left for you was when using alter table t engine=InnoDB, when would the table actually occupy more space?

In the comments section of this article, everyone mentioned a point, which is that the table itself no longer has any holes, for example, if a rebuild table operation has just been performed.

During DDL, if there happens to be external DML execution, new holes may be introduced during this period.

@飞翔 mentioned a more profound mechanism that we did not mention in the article. During the rebuild operation, InnoDB does not fill the entire table, but leaves 1/16 of each page for subsequent updates. In other words, the table is not “most” compact after rebuilding.

If the process is as follows:

Rebuild table t once;
Insert some data, but these inserted data occupy part of the reserved space;
In this situation, if table t is rebuilt again, the phenomenon mentioned in the question may occur.