13 Why Deleting Half of the Table Data Doesn’t Change the File Size #

I often get asked by students why their database takes up so much space, and even after deleting half of the data in the largest table, the table file size remains unchanged. Today, I will discuss database table space reclamation and how to solve this problem.

Here, we will focus on the widely used InnoDB engine in MySQL. An InnoDB table consists of two parts: table structure definition and data. Before MySQL 8.0, the table structure was stored in files with the .frm extension. However, starting from MySQL 8.0, it is now possible to store the table structure definition in system data tables. Since the space occupied by the table structure definition is negligible, today we will mainly discuss table data.

Next, I will first explain why simply deleting table data does not result in table space reclamation, and then I will introduce the correct method for space reclamation.

The “innodb_file_per_table” parameter #

Table data can either reside in a shared tablespace or in individual files. This behavior is controlled by the “innodb_file_per_table” parameter:

When this parameter is set to OFF, the table data is stored in the system shared tablespace along with the data dictionary.
When this parameter is set to ON, each InnoDB table data is stored in a file with the .ibd extension.

Starting from MySQL 5.6.6, the default value for this parameter has been ON.

I recommend setting this value to ON regardless of the MySQL version you are using. Storing each table as a separate file makes it easier to manage, and when you no longer need a table, the system will directly delete its associated file through the drop table command. On the other hand, if the table is stored in the shared tablespace, the space will not be reclaimed even if the table is deleted.

Therefore, setting “innodb_file_per_table” to ON is the recommended approach, and our subsequent discussions will be based on this setting.

When we delete the entire table, we can use the drop table command to reclaim the table space. However, in most cases, we encounter scenarios where only certain rows are deleted, which leads to the problem mentioned at the beginning of this article: the data in the table is deleted, but the table space is not reclaimed.

To fully understand this problem, we need to start with the data deletion process.

Data deletion process #

Let’s take another look at the schematic diagram of an index in InnoDB. In the previous [article 4] and [article 5], I mentioned that data in InnoDB is organized using a B+ tree structure.

Figure 1: Schematic Diagram of a B+ Tree Index

Suppose we want to delete the record R4. The InnoDB engine will only mark the record R4 as deleted. If we later insert a record with an ID between 300 and 600, this position may be reused. However, the size of the disk file will not shrink.

Now that you know that InnoDB data is stored in pages, what happens if we delete all the records on a data page?

The answer is that the entire data page can be reused.

However, the reuse of a data page is different from the reuse of a record.

The reuse of a record is limited to data that fits within a certain range. For example, in the above example, after the record R4 is deleted, if we insert a row with ID 400, we can directly reuse that space. But if we insert a row with ID 800, we cannot reuse that position.

On the other hand, when a page is removed from the B+ tree, it can be reused anywhere. In the case of Figure 1, if all the records on data page A are deleted, page A will be marked as reusable. If we need to insert a record with ID=50 and need to use a new page, page A can be reused.

If two adjacent data pages have low utilization rates, the system will merge the data from these pages into one of them, and the other data page will be marked as reusable. Furthermore, what if we use the delete command to delete the entire table? The result is that all data pages will be marked as reusable. However, the size of the file on the disk will not change.

Now you know that the delete command actually only marks the location of the record or data page as “reusable”, but the size of the disk file does not change. In other words, the delete command cannot reclaim table space. The unused space that can be reused looks like “holes”.

In fact, not only deleting data can cause holes, inserting data can also cause them.

If data is inserted in ascending order according to the index, the index is compact. But if data is randomly inserted, it may cause data page splits in the index.

Suppose in Figure 1, page A is already full. What will happen if I insert another row of data?

Figure 2: Data insertion causing page split

As can be seen, since page A is full, when inserting data with an ID of 550, a new page, page B, has to be allocated to store the data. After the page split, there will be holes at the end of page A (note: there may be more than 1 hole).

In addition, updating values in the index can be understood as deleting an old value and inserting a new value. It is not difficult to understand that this also causes holes.

In other words, tables with a large number of insert, delete, and update operations may have holes. Therefore, if these holes can be removed, the table space can be shrunk.

And rebuilding the table can achieve this.

Rebuilding the table #

Imagine if you have a table A that needs space shrinking, how can you do it by removing the holes in the table?

You can create a new table B with the same structure as table A and then read data from table A row by row and insert it into table B in ascending order of the primary key ID.

Since table B is a newly created table, the holes in the primary key index of table A will no longer exist in table B. Obviously, the primary key index of table B is more compact, and the data page utilization is higher. If we use table B as a temporary table, after importing data from table A to table B, we can replace A with B, and in effect, shrink the space of table A.

Here, you can use the command “alter table A engine=InnoDB” to rebuild the table. Prior to MySQL 5.5, the execution flow of this command is similar to what we described earlier, with the only difference being that you don’t need to create the temporary table B yourself, as MySQL will automatically complete the operations of transferring data, swapping table names, and deleting the old table.

Figure 3: Locking table during DDL

Obviously, the step that takes the most time is inserting data into the temporary table. If there is new data to be written to table A during this process, it will result in data loss. Therefore, there should be no updates to table A during the entire DDL process. In other words, this DDL operation is not online.

Starting from MySQL 5.6, Online DDL was introduced, which optimizes this operation flow.

Let me briefly describe the process of rebuilding the table after the introduction of Online DDL:

Create a temporary file and scan all data pages of the primary key in table A.
Generate a B+ tree with the records of table A from the data pages, and store it in the temporary file.
As the temporary file is being generated, record all operations on table A in a log file (row log), corresponding to the state2 in the diagram.
After the temporary file is generated, apply the operations from the log file to the temporary file, resulting in a logical data file that is the same as table A, corresponding to the state3 in the diagram.
Replace the data file of table A with the temporary file.

Figure 4 Online DDL

As can be seen, the difference between this process and Figure 3 is that due to the existence of the functionality of logging and replaying operations, this solution allows for insert, delete, and update operations on table A during the table rebuilding process. This is where the name “Online DDL” comes from.

I remember that a classmate left a comment in the comments section of the 6th article on table locks [“Global Lock and Table Lock: Why Are There So Many Obstacles to Adding a Field to a Table?”] saying that MDL write locks are required before DDL. Can it still be called Online DDL?

Indeed, in the flowchart in Figure 4, the alter statement needs to acquire an MDL write lock at startup, but this write lock is downgraded to a read lock before the data copying process begins.

Why is there a downgrade? In order to achieve Online, MDL read locks do not block insert, delete, and update operations.

Then why not simply unlock? In order to protect itself and prohibit other threads from performing DDL on this table simultaneously.

For a large table, the most time-consuming step in the process of Online DDL is the data copying to the temporary table. During the execution of this step, insert, delete, and update operations can be performed. Therefore, compared to the entire DDL process, the time during which the lock is held is very short. For the business, it can be considered as Online.

It should be noted that the above-mentioned rebuilding methods all involve scanning the original table data and constructing temporary files. For very large tables, this operation consumes a lot of IO and CPU resources. Therefore, if it is an online service, you need to carefully control the operation time. If you want to perform operations that are relatively safe, I recommend using gh-ost, an open-source project on GitHub.

Online and inplace #

Speaking of Online, I would like to clarify the difference between it and another concept related to DDL that is easily confused, inplace.

You may have noticed that in Figure 3, we refer to the storage location where the data from table A is exported as “tmp_table”. This is a temporary table created at the server level.

In Figure 4, the data reconstructed based on table A is stored in “tmp_file”. This temporary file is created internally by InnoDB. The entire DDL process is completed internally by InnoDB. From the perspective of the server level, the data is not moved to a temporary table, so it is an “inplace” operation.

So, let me ask you, if you have a table of 1TB and the disk space is 1.2TB, can you perform an inplace DDL?

The answer is no. Because the tmp_file also occupies temporary space.

The statement “alter table t engine=InnoDB” that we use to rebuild the table actually implies:

alter table t engine=innodb,ALGORITHM=inplace;

The opposite of inplace is the method of copying tables. The usage is:

alter table t engine=innodb,ALGORITHM=copy;

When you use ALGORITHM=copy, it means to forcefully copy the table, and the corresponding process is the operation process in Figure 3.

But you might think, inplace and Online are the same thing, right?

Actually, they are not. It just happens to be the case in the logic of rebuilding tables.

For example, if I want to add a full-text index to a field in an InnoDB table, the syntax is:

alter table t add FULLTEXT(field_name);

This process is inplace but it blocks insert, delete, and update operations, so it is not Online.

If we were to summarize the relationship between these two logics, it would be:

If the DDL process is Online, it must be inplace;
However, the opposite is not necessarily true, which means inplace DDL may not be Online. As of MySQL 8.0, adding a full-text index (FULLTEXT index) and a spatial index (SPATIAL index) fall into this category.

Finally, let’s extend further.

In the comments section of the 10th article [“Why Does MySQL Sometimes Choose the Wrong Index”], a classmate asked about the differences between three methods of table rebuilding: optimize table, analyze table, and alter table. Here, let me briefly explain it to you.

Starting from MySQL 5.6, the default process of “alter table t engine = InnoDB” (i.e., recreate) is the flowchart shown in Figure 4;
analyze table t is not actually a table rebuilding operation, it only recalculates the index information of the table without modifying the data, and during this process, an MDL read lock is acquired;
optimize table t is equivalent to recreate+analyze.

Summary #

In today’s article, I discussed the methods for shrinking tablespace in a database.

Now you know that if you want to shrink a table, simply deleting unused data from the table will not reduce the size of the table file. You need to rebuild the table through the alter table command to achieve the purpose of reducing the size of the table file. I introduced two ways to rebuild the table, and the Online DDL method can be considered for use during off-peak hours of the business. However, in versions earlier than MySQL 5.5, this command would block DML, so you need to be particularly careful.

Finally, it’s time for our question time.

Suppose someone encounters a situation where the size of the tablespace increases instead of decreasing despite their desire to shrink the tablespace. The situation looks like this:

The file size of a table t is 1TB;
An alter table t engine = InnoDB command is executed on this table;
After the execution is completed, it is found that the space not only did not decrease but also increased slightly, for example, it became 1.01TB.

What do you think might be the reason?

You can write your possible reasons in the comments area, and in the end of the next article, I will list the reasonable reasons described by everyone so that other classmates will not fall into such a pit. Thank you for listening, and feel free to share this article with more friends to read together.

Previous Question Time #

At the end of the previous article, the question I left you with was: What happens if the redo log is set too small on a high-performance machine?

Every time a transaction is committed, it needs to write to the redo log. If the redo log is set too small, it will be quickly filled up, as shown in the following diagram. The “ring” will quickly be filled, and the write pos will constantly pursue the CP.

At this time, the system has to stop all updates to advance the checkpoint.

The phenomenon you will observe at this time is that the disk pressure is low, but the database experiences intermittent performance degradation.