33 Will Querying Lots of Data Burst the Database Memory

33 Will Querying Lots of Data Burst the Database Memory #

I often get asked this question: My host only has 100G of memory, and now I need to perform a full table scan on a 200G large table. Will it use up all the memory on the database host?

This is indeed a concern because running out of memory (OOM) is no joke. But think about it, isn’t logical backup also a full database scan? If doing a full table scan would use up all the memory, then logical backup would have crashed long ago, right?

So, it seems that there shouldn’t be a problem with performing a full table scan on a large table. But what is the process exactly?

The impact of a full table scan on the server layer #

Let’s assume that we need to perform a full table scan on an InnoDB table db1.t with a size of 200G. Of course, you would use a command like this to save the scan results on the client-side:

mysql -h$host -P$port -u$user -p$pwd -e "select * from db1.t" > $target_file

As you already know, InnoDB data is stored on the primary key index, so a full table scan actually scans the primary key index of table t directly. This query doesn’t have any other conditions, so each row it finds can be directly placed into the result set and returned to the client.

So, where is this “result set” stored?

In reality, the server-side doesn’t need to save a complete result set. The process of retrieving and sending data is as follows:

  1. Get a row and write it to the net_buffer. The size of this memory block is defined by the net_buffer_length parameter, which defaults to 16k.
  2. Repeat getting rows until the net_buffer is full, then call the network interface to send it out.
  3. If the sending is successful, clear the net_buffer and continue getting the next row and writing it to the net_buffer.
  4. If the sending function returns EAGAIN or WSAEWOULDBLOCK, it means the local network stack (socket send buffer) is full, so it waits until the network stack becomes writable again before continuing to send.

The corresponding flowchart for this process is shown below.

img

Figure 1 Query Result Sending Process

From this process, you can see that:

  1. The maximum amount of memory used by MySQL internally during the sending process of a query is just net_buffer_length, and it won’t reach 200G.
  2. The socket send buffer cannot reach 200G (default definition in /proc/sys/net/core/wmem_default). If the socket send buffer is full, it will pause the data reading process.

In other words, MySQL processes read and send data simultaneously, which is an important concept to understand. This means that if the client is slow to receive, it will cause the MySQL server to take a longer execution time for sending out the results.

For example, the following state intentionally shows that the client is not reading the contents of the socket receive buffer, as seen in the result of show processlist on the server-side.

img

Figure 2 Server-side Sending Block If you see the value of State is always “Sending to client”, it means that the server-side network stack is full.

In my previous article, I mentioned that if the client uses the “–quick” parameter, the mysql_use_result method is used. This method reads and processes one row at a time. You can imagine that if the logic to be processed after reading each row is slow, the client will take a long time to retrieve the next row of data, which may result in the situation shown in Figure 2.

Therefore, for normal online businesses, if the result of a query is not expected to be large, I recommend using the mysql_store_result interface to directly save the query result in local memory.

Of course, the premise is that the query does not return too much data. In the comments section of the 30th article, a student mentioned that they used a large query that resulted in the client occupying nearly 20GB of memory. In this case, you need to switch to the mysql_use_result interface.

On the other hand, if you see many threads in the “Sending to client” state in the MySQL instance you maintain, it means you need to optimize the query result and evaluate whether having so many results is reasonable.

If you want to quickly reduce the number of threads in this state, setting the net_buffer_length parameter to a larger value is an optional solution.

Another state that looks very similar to “Sending to client” is “Sending data”, which is often misunderstood. Some students have asked me why they see many query statements in the “Sending data” state in their maintained instances, but there doesn’t seem to be any problem with the network. Why does sending data take so long?

In fact, the state changes of a query statement are as follows (Note: I have omitted other irrelevant states here):

  • After the MySQL query statement enters the execution stage, the state is first set to “Sending data”.
  • Then, it sends the meta data related to the columns of the execution result to the client.
  • Then, it continues the execution process of the statement.
  • After the execution is completed, the state is set to an empty string.

In other words, “Sending data” does not necessarily mean “sending data”, but it could be at any stage in the executor process. For example, you can create a lock waiting scenario to see the “Sending data” state.

img

Figure 3 - Full table scan with lock

img

Figure 4 - Sending data state

As you can see, session B is clearly waiting for a lock, and the state is displayed as “Sending data”.

In other words, “Sending to client” is only displayed when a thread is in the state of “waiting for the client to receive the result”, while “Sending data” means “being executed”.

Now you know that the query results are sent to the client in segments, so scanning the entire table and returning a large amount of data will not cause memory overload.

We understand the handling logic in the server layer, but how does InnoDB handle it? Does scanning the entire table have an impact on the InnoDB engine system?

Impact of Full Table Scans on InnoDB #

In the 2nd and 15th articles, when I introduced the WAL mechanism, I analyzed one of the roles of InnoDB’s memory which is to store updated results along with the redo log to avoid random writes to disk.

Data pages in memory are managed in the Buffer Pool (BP). In the WAL mechanism, the Buffer Pool accelerates updates. In fact, the Buffer Pool has another more important role, which is to accelerate queries.

In the comments section of the 2nd article, a student asked a question. Due to the WAL mechanism, when a transaction is committed, the data pages on disk are old. If a query immediately reads this data page, should the redo log be applied to the data page immediately?

The answer is no. Because at this time, the result in the memory data page is the latest one, so you can simply read the memory page. You see, the query does not need to read from disk at all, it directly retrieves the result from memory, which is very fast. Therefore, the Buffer Pool also accelerates queries. The Buffer Pool plays an important role in accelerating queries and relies on a critical metric called the cache hit ratio.

You can check the current cache hit ratio of a system in the result of “show engine innodb status”. In general, for a stable online system that meets the response time requirements, the cache hit ratio should be above 99%.

By executing “show engine innodb status”, you can see the phrase “Buffer pool hit rate”, which displays the current hit ratio. For example, in Figure 5, the hit ratio is 99.0%.

img

Figure 5 shows the cache hit ratio displayed by “show engine innodb status”.

Ideally, if all the data pages required by queries can be retrieved directly from the memory, that would be great, and the corresponding hit ratio would be 100%. However, this is difficult to achieve in practical production environments.

The size of the InnoDB Buffer Pool is determined by the parameter “innodb_buffer_pool_size”, and it is generally recommended to set it to 60%~80% of the available physical memory.

About a decade ago, the data volume of a single machine was hundreds of GBs, while the physical memory was only a few GBs. Although many servers now have 128GB or even higher memory, the data volume of a single machine has reached the terabyte level.

Therefore, it is common for “innodb_buffer_pool_size” to be smaller than the disk data volume. If a Buffer Pool is full and needs to read a data page from the disk, an old data page is definitely going to be evicted.

InnoDB uses the Least Recently Used (LRU) algorithm for memory management, which involves evicting the least recently used data.

The following diagram shows the basic model of the LRU algorithm.

img

Figure 6: Basic LRU algorithm

InnoDB manages the Buffer Pool using a linked list to implement the LRU algorithm.

  1. In state 1 of Figure 6, the head of the linked list is P1, indicating that P1 is the most recently accessed data page; let’s assume that only this number of data pages can fit into memory.
  2. At this point, a read request accesses P3, resulting in state 2, where P3 is moved to the front.
  3. State 3 indicates that the accessed data page is not in the linked list, so a new data page, Px, needs to be allocated in the Buffer Pool and placed at the head of the linked list. However, since the memory is already full, new memory cannot be allocated. As a result, the content of the data page Pm at the end of the linked list is cleared and replaced with the content of Px, which is then placed at the head of the linked list.
  4. In terms of the effect, the oldest data page Pm that has not been accessed for a long time is evicted.

At first glance, this algorithm may not seem to have any issues. However, what if we consider performing a full table scan?

Suppose we want to scan a 200GB table using this algorithm. This table is a historical data table that is not accessed by business operations usually.

By following this algorithm, all the data in the Buffer Pool will be evicted and replaced with the content of the data pages accessed during the scanning process. In other words, the Buffer Pool will mainly store the data from this historical data table.

For a database that is actively serving business operations, this is not ideal. As a result, the memory hit rate of the Buffer Pool will sharply decrease, disk pressure will increase, and SQL statement response time will slow down.

Therefore, InnoDB cannot directly use this LRU algorithm. In practice, InnoDB has made improvements to the LRU algorithm.

img

Figure 7: Improved LRU algorithm In InnoDB implementation, the entire LRU list is divided into young and old regions in a ratio of 5:3. The LRU_old in the figure points to the first position of the old region, which is 5/8 of the entire list. This means that the 5/8 near the beginning of the list is the young region, and the 3/8 near the end of the list is the old region.

The workflow of the improved LRU algorithm is as follows:

  1. In state 1 in Figure 7, when accessing data page P3, since P3 is in the young region, it is moved to the head of the list, becoming state 2, just like the original LRU algorithm.
  2. When accessing a new data page that does not exist in the current list, the data page Pm is still evicted, but the newly inserted data page Px is placed in LRU_old.
  3. When a data page in the old region is accessed, the following judgment is made each time:
    • If the amount of time this data page has been in the LRU list exceeds 1 second, it is moved to the head of the list.
    • If the amount of time this data page has been in the LRU list is less than 1 second, its position remains unchanged. The parameter innodb_old_blocks_time controls this 1 second time, with a default value of 1000 milliseconds.

This strategy is tailored to handle operations such as full table scans. Taking the scanning of a 200 GB historical data table mentioned earlier as an example, let’s take a look at the operation logic of the improved LRU algorithm:

  1. During the scanning process, data pages that need to be newly inserted are placed in the old region.
  2. A data page contains multiple records and will be accessed multiple times. However, since it is a sequential scan, the time interval between the first and last accesses to this data page will not exceed 1 second, so it will still be retained in the old region.
  3. When continuing to scan subsequent data, the previous data page will not be accessed again, so it never has a chance to move to the head of the list (i.e., the young region) and will be quickly evicted.

As you can see, the biggest benefit of this strategy is that during the scan of this large table, although Buffer Pool is also used, it has no impact on the young region, thereby ensuring a normal query hit rate for Buffer Pool in regular business operations.

Conclusion #

Today, I explained the process of MySQL delivering query results to clients using the question “Will large queries consume all memory?”

Since MySQL uses “compute on the fly” logic, it does not save the complete result set on the server side for large query results. Therefore, if the client does not read the result promptly, it may block the MySQL query process, but it will not run out of memory.

For InnoDB engine internals, large queries will not cause a memory spike due to the eviction strategy. Moreover, because InnoDB has made improvements to the LRU algorithm, the impact of full table scans of cold data on the Buffer Pool can also be controlled.

Of course, as mentioned in previous articles, full table scans still consume a significant amount of IO resources, so performing full table scans directly on the primary database during peak business hours is not advisable.

Finally, I have a question for you to ponder.

I mentioned in the article that if the client is under heavy pressure and cannot receive results in a timely manner, it will prevent MySQL from sending results and affect statement execution. However, this is not the worst-case scenario.

Can you imagine an example where the client’s performance issues have a more severe impact on the database? Have you ever encountered such a situation? How did you optimize it?

Feel free to share your experiences and analysis in the comments, and I will discuss this issue with you at the end of the next article. Thank you for reading, and please feel free to share this article with more friends to read together.

Answer to Previous Question #

The question from the previous article was whether, after a transaction is killed and remains in the rolling back state, it is better to restart and wait for it to finish or forcefully restart the entire MySQL process.

Since the rollback actions still need to be performed after the restart, it is better to let it finish on its own in terms of recovery speed.

However, if this statement may occupy other locks or if it consumes too many IO resources, affecting the execution of other statements, it is necessary to perform a master-slave switch to switch to a new master database to continue providing services.

After the switch, all other threads will be disconnected and will stop execution automatically. Then, we still need to wait for the rollback to finish. This operation falls under what we mentioned in the article about reducing system pressure and accelerating termination logic.