06 Table Compression Not Just Space Compression

06 Table Compression - Not Just Space Compression #

In the previous few lessons, we have moved from choosing various column types to designing table structures. I believe that after learning the previous lessons, you are able to design various business tables, such as user tables and order tables. Now that we have mastered the logical design of tables, let’s continue learning about the physical storage design of different business tables.

From my observation, many students do not consider storage design at the beginning of table structure design. They only realize the seriousness of the problem when the business grows to a certain scale. Physical storage mainly considers whether to enable table compression, and by default, all tables are non-compressed.

However, when some students hear about compression, they instinctively think that compression will lead to a decline in the performance of the MySQL database. This view is not entirely correct and needs to be differentiated based on different scenarios. In this lesson, let’s take a look at the physical storage design of tables: the use of table compression in different scenarios.

Table Compression #

Tables in a database are composed of rows, and each row is stored in a page. In MySQL, the default page size is 16KB, and pages make up the tablespace of each table.

Usually, we believe that the more records stored in a page, the higher the performance of the database. This is because the pages in the database tablespace are stored on disk, and MySQL needs to read the pages from disk into the buffer pool in memory and then read and manage records in page units.

The more records stored in a page, the more records can be stored in memory, and consequently, the higher the efficiency. To increase the number of records stored in a page, compression can be enabled. Additionally, after compression is enabled, the storage space occupancy decreases, and the amount of data that can be stored in the same unit of storage increases.

To enable compression, the database can compress records, pages, and tablespaces. However, in practical engineering, we generally use page compression. Why is that?

  • Compressing each record: Compressing and decompressing are required for each read and write operation, relying too much on the CPU’s computing power, which will significantly degrade performance. Additionally, since individual record sizes are not particularly large, usually less than 1KB, the compression efficiency is not significantly improved.
  • Compressing tablespace: The compression efficiency is very good, but it requires the tablespace file to not grow dynamically, which is difficult to achieve for disk-based relational databases.

Page compression, on the other hand, can improve compression efficiency while maintaining a balance between performance and compression efficiency.

Many students may think that enabling page compression on a table will result in a noticeable loss of performance because compression incurs additional overhead. Indeed, compression consumes additional CPU instructions, but compression does not necessarily mean a decline in performance. It may even increase performance because the CPU processing capacity of most database systems is underutilized, and I/O load is the main bottleneck.

With the help of page compression technology, MySQL can compress a 16KB page to 8KB or even 4KB. This reduces the size of I/O requests when reading from or writing to disk, thereby improving the overall performance of the database.

Of course, compression is a trade-off and does not always improve database performance. This performance “balance” depends on the trade-off between the benefits and costs brought by compression and decompression overhead. However, regardless, compression can effectively reduce the original capacity of data, bringing significant benefits to storage space.

MySQL Table Compression Design #

COMPRESS Page Compression #

COMPRESS page compression is a page compression feature provided before MySQL 5.7. Simply specify ROW_FORMAT=COMPRESSED when creating a table, and set the compression ratio through the KEY_BLOCK_SIZE option.

It is important to remember that although the compression feature is enabled through the ROW_FORMAT option, it is not record-level compression; it still compresses based on the page dimension.

Here is an example of a log table with ROW_FORMAT set to COMPRESS, which enables COMPRESS page compression, and KEY_BLOCK_SIZE set to 8, which compresses a 16KB page to 8KB:

CREATE TABLE Log (

  logId BINARY(16) PRIMARY KEY,

  ......

)

ROW_FORMAT=COMPRESSED

KEY_BLOCK_SIZE=8

COMPRESS page compression compresses a page to the specified size. For example, it compresses a 16KB page to 8KB. If a 16KB page cannot be compressed to 8KB, two compressed 8KB pages will be produced, as shown in the following diagram:

Image 3

COMPRESS Page Compression

In general, COMPRESS page compression is suitable for business tables that are not sensitive to performance, such as log tables, monitoring tables, alert tables, etc. The compression ratio can usually reach around 50%.

Although COMPRESS compression can effectively reduce storage space, the implementation of COMPRESS page compression incurs significant performance overhead and performance degradation. The main reason is that a compressed page has two versions in the memory buffer pool: one for compression and one for decompression.

Image 4

1 COMPRESS compressed page has 2 page versions in memory

As shown in the diagram, Page1 and Page2 are both compressed pages of 8KB, but there is also an uncompressed 16KB page in memory. This design is because the 8KB page is used for subsequent page updates, while the 16KB page is used for reading, so reading does not require decompression every time.

Clearly, this implementation increases memory overhead, reduces the amount of valid data that can be stored in the cache pool, and naturally leads to a noticeable degradation in the performance of the MySQL database.

To address the issue of performance degradation caused by compression, MySQL introduced the TPC compression feature starting from version 5.7.

TPC Compression #

TPC (Transparent Page Compression) is a new page compression feature introduced in version 5.7. It uses the hole-punching feature of the file system for compression. You can use the following command to create a TPC compressed table:

CREATE TABLE Transaction (
  transactionId BINARY(16) PRIMARY KEY,
  .....
)
COMPRESSION=ZLIB | LZ4 | NONE;

To use TPC compression, first confirm if the current operating system supports the hole-punching feature. Generally, common Linux operating systems already support this feature.

Since holes are a feature of the file system, hole-punching compression can only compress to the minimum unit of the file system, which is 4K. The page compression is also aligned to 4K. For example, if a 16K page is compressed to 7K, the actual space occupied is 8K; if compressed to 3K, the actual space occupied is 4K; if compressed to 13K, the space occupied is still 16K.

The specific implementation of TPC compression is shown below:

Image

TPC Page Compression

In the above image, a 16K page is compressed to 8K, and then the remaining 8K of the 16K page is filled with 0x00. When this 16K page is written to disk, it will only occupy 8K of physical storage space by utilizing the hole-punching feature of the file system.

Another advantage of hole-punching compression is that it has almost no impact on database performance (less than 20%), and may even improve performance.

This is because unlike COMPRESS page compression, TPC compression only has one decompressed page of 16K in memory, so there is no additional storage overhead for the buffer pool.

On the other hand, all read and write operations on pages are the same as non-compressed pages, without any overhead. Page compression is only triggered when the page needs to be flushed to disk. However, since a 16K page is compressed to 8K or 4K, the write performance can be improved.

Image

Official TPC Test Comparison

The image above shows the results of the official LinkBench test in MySQL. It can be seen that the uncompressed test result is 13,432 QPS, while the traditional COMPRESS page compression performance decreases to 10,480 QPS, a decrease of nearly 30% in performance. Based on TPC compression, the test result is 18,882, which brings an additional 40% performance improvement on top of the uncompressed result.

Using Table Compression in Business Scenarios #

In general, for business tables that are not sensitive to performance, such as log tables, monitoring tables, alarm tables, etc., where storage space is the main requirement, COMPRESS page compression can be used.

For more critical transactional business tables, I recommend using TPC compression. Transactional information is a type of crucial data storage business that usually accompanies core business operations. For example, a transaction in an e-commerce system involves deducting money, placing an order, and recording the transaction, which is a micro model of a core business.

Therefore, users have performance requirements for transactional tables. In addition, transactional data can be very large, so enabling compression can more effectively store the data.

If you are concerned about performance fluctuations caused by compression, my suggestion is as follows: Since transactional tables are usually stored on a monthly or daily basis, do not enable TPC compression for the current active transactional table. Instead, enable TPC compression for historical transactional tables, as shown below:

Image

Design of a Transactional Table

Special attention needs to be paid: Enabling TPC page compression using the command ALTER TABLE xxx COMPRESSION = ZLIB will only compress newly added data and will not compress existing data. So the aforementioned ALTER TABLE operation only modifies metadata and can be completed instantly.

To compress the entire table, the OPTIMIZE TABLE command needs to be executed:

ALTER TABLE Transaction202102 COMPRESSION=ZLIB;

OPTIMIZE TABLE Transaction202102;

Summary #

When designing table structures, in addition to column selection, storage design needs to be considered. Specifically, when designing table compression functionality, the following points summarize it:

  • Compression in MySQL is based on page compression.
  • COMPRESS page compression is suitable for business tables with low performance requirements, such as log tables, monitoring tables, alarm tables, etc.
  • With COMPRESS page compression, there are two pages in memory buffer pool for compression and decompression, which significantly affects performance.
  • For storage with compression requirements, and when performance degradation is not desirable, it is recommended to use TPC compression.
  • Enabling TPC compression through ALTER TABLE xxx COMPRESSION = ZLIB command only compresses newly added data and does not compress existing data. So the aforementioned ALTER TABLE operation only modifies metadata and can be completed instantly.
  • To compress the entire table, the OPTIMIZE TABLE command needs to be executed.