38 Database Parameter Setting Optimization the Difference Between the Slightest Negligence and a Thousand Miles

38 Database parameter setting optimization The difference between the slightest negligence and a thousand miles #

Hello, I’m Liu Chao.

MySQL is a highly flexible database system that provides many configurable parameters, allowing us to customize the database service according to our application and server hardware. If I were to ask you now, you might think that you rarely adjust MySQL’s configuration parameters during development. However, today I want to emphasize the importance of delving deeper into understanding them.

We know that databases are primarily used for storing and accessing data, which involves disk I/O read and write operations. Therefore, the main performance bottleneck of a database system lies in its I/O operations. To reduce disk I/O, MySQL database applies extensive memory management to optimize database operations, including memory-optimized queries, sorting, and write operations.

You may think that it is better to set a larger amount of memory and quickly flush data to disk. Isn’t that right? Actually, setting the memory too large can also bring new problems. For example, in InnoDB, if the data and index caches are set too large, it can cause SWAP page swapping. Additionally, faster data writing to disk is not always better. What we expect is for the data to be evenly written to the disk during high concurrency, thus avoiding I/O performance bottlenecks.

SWAP page swapping: When the physical memory in a system is not enough, the SWAP partition will release a portion of the physical memory space for the currently running program to use. The data from the released space, which may come from programs that have been idle for a long time, is temporarily stored in the SWAP partition. When those programs need to run again, the data saved in the SWAP partition will be restored to the memory.

Therefore, the settings of these parameters are highly dependent on our application’s service characteristics and server hardware. MySQL is a highly customizable database, and we can adjust its parameters according to our needs to achieve optimal performance.

However, in order to understand the specific functions of these parameters, we need to first understand the structure of the database and the working principles of different storage engines.

MySQL Architecture #

Generally, the structure of MySQL can be divided into four layers. The top layer is the client connector, which includes database connections, authorization authentication, and security management. This layer utilizes the thread pool to increase the efficiency of handling connection requests.

The second layer is the Server layer, which implements basic SQL functionalities, including SQL parsing, optimization, execution, and caching. The caching aspect is the main focus of this lecture.

The third layer consists of various storage engines, which are responsible for data storage and retrieval. The Buffer cache, which is closely related to this lecture, is involved in this layer.

The bottom layer is the data storage layer, which is responsible for storing data in the file system and interacting with the storage engine.

img

Next, let’s understand how a SQL statement is processed when the data receives it.

1. Query Statements #

An application service needs to go through the first layer, which includes connection and authorization authentication, and then sends the SQL request to the SQL interface. After receiving the request, the SQL interface first checks if the query SQL matches the data in the cache. If it matches, the cached result is directly returned. Otherwise, it needs to go through the parser.

The parser mainly performs syntax and lexical analysis on the SQL. After that, it enters the optimizer, which generates multiple execution plan options and selects the optimal plan for execution.

Once the optimal execution plan is determined, the executor checks if the connected user has the execution permission for the table. If so, it checks if the data is available in the Buffer cache. If it is, it acquires the lock and queries the table data. Otherwise, it reopens the table file and processes it through the interface by calling the corresponding storage engine. At this point, the storage engine accesses the storage file system to retrieve the corresponding data and returns the result set.

2. Update Statements #

The execution process of update SQL for the database is similar to that of query SQL, except that additional steps are taken for logging. When executing an update operation, MySQL logs the operation to the binlog (archive log). This step is present in all storage engines. However, InnoDB, in addition to logging to the binlog, also logs to the redo log.

The redo log is mainly introduced to solve the crash-safe problem. When the database experiences an abnormal restart during data storage, we need to ensure that the stored data is either successfully stored or not stored at all, without any data loss. This is what crash-safe means.

When executing an update operation, the related data is first queried. Then, the executor executes the update operation and writes the execution result to memory. At the same time, the update operation is recorded in the redo log cache. At this point, the redo log record in the cache is in the “prepare” state, and the executor is notified that the update is completed and the transaction can be committed. Upon receiving the notification, the executor performs binlog writing. The binlog is also cached and, after a successful write, the transaction interface of the engine is called, and the record status is updated to “commit”. Then, the redo log and binlog in memory are flushed to disk files.

Memory Optimization #

Based on the two SQL execution processes mentioned above, we can see that there are two caches involved when executing query SQL statements. The first cache is the Query Cache when we first enter, which caches SQL statements and their corresponding result sets. This cache uses the Hash value of the query SQL as the key and the result set as the value of the key-value pair. To determine whether a SQL statement hits the cache, we match the Hash value of the query SQL.

It is obvious that the Query Cache can optimize query SQL statements and reduce a lot of work, especially reducing I/O read operations. We can optimize query operations through the following main configuration parameters:

img

We can reduce fragmentation by setting an appropriate value for query_cache_min_res_unit. The optimal value of this parameter is directly related to the average size of the query result of the application program. It can be calculated using the following formula:

(query_cache_size - Qcache_free_memory) / Qcache_queries_in_cache

The values of Qcache_free_memory and Qcache_queries_in_cache can be obtained by using the following command:

show status like 'Qcache%'

Although the Query Cache can optimize query operations, it is only effective for data that is not frequently modified. If a table is frequently inserted, updated, or deleted, the Query Cache will be highly inefficient, resulting in frequent clearing of data in the cache, thereby adding extra performance overhead to the system.

This will also result in a very low cache hit rate. We can use the above command to check the Qcache_hits, which represents the cache hit rate. If the cache hit rate is very low, we can also disable the query cache by setting query_cache_size = 0 or query_cache_type.

After being cached by the Query Cache, the Buffer cache in the storage engine is also used. Different storage engines use different Buffer caches. Here we mainly explain two commonly used storage engines.

1. MyISAM Storage Engine Parameter Tuning #

The MyISAM storage engine uses the key buffer to cache index blocks, and the data blocks of MyISAM tables are not cached and are directly stored in disk files.

We can set the size of the key buffer by configuring the key_buffer_size. However, a larger size is not always better. As I mentioned earlier, if the key buffer is set too large, it may lead to memory wastage and cause the system to perform SWAP page swapping. Generally, I recommend allocating 1/4 of the available memory in the server to the key buffer.

To more accurately evaluate whether the setting of the key buffer is reasonable, we can calculate the cache utilization rate using the following formula:

1-((key_blocks_unused*key_cache_block_size)/key_buffer_size)

key_blocks_unused represents the number of unused cache blocks key_cache_block_size represents the size of the region into which key_buffer_size is divided key_blocks_unused*key_cache_block_size represents the remaining available cache space (generally, a cache utilization rate of around 80% is appropriate).

2. InnoDB Storage Engine Parameter Tuning #

The InnoDB Buffer Pool (IBP) is a buffer pool in the InnoDB storage engine. Unlike the MyISAM storage engine, which uses the key buffer to cache, the IBP not only caches index blocks but also caches table data. When querying data, the IBP allows quick access to frequently accessed data without accessing disk files. The more InnoDB tablespace is cached, the lower the frequency of physical disk access by MySQL, which means faster query response time and improved overall system performance.

We can generally optimize the performance of InnoDB tables by adjusting several configuration parameters.

  • innodb_buffer_pool_size

The default size of the IBP is 128M. We can set the size of the IBP using the innodb_buffer_pool_size parameter. The larger the IBP size, the better the performance of InnoDB tables. However, setting the IBP size too large is not recommended as it may cause system SWAP page swapping. Therefore, we need to strike a balance between the IBP size and the memory required by other system services. MySQL recommends configuring the IBP size as 80% of the server’s physical memory.

We can also adjust the size of the IBP by calculating the hit rate of the InnoDB buffer pool:

(1-innodb_buffer_pool_reads/innodb_buffer_pool_read_request)*100

However, if we set the IBP size to 80% of the physical memory and find that the hit rate is still very low, then we should consider increasing the memory to increase the IBP size.

  • innodb_buffer_pool_instances

The IBP cache pool in InnoDB is divided into multiple instances. For systems with gigabytes of buffer pool, dividing the cache pool into separate instances can reduce contention when different threads read from and write to cache pages, thereby improving system concurrency. This parameter only takes effect when the innodb_buffer_pool_size is set to 1GB or larger. In a Windows 32-bit operating system, if the size of innodb_buffer_pool_size exceeds 1.3GB, the default value of innodb_buffer_pool_instances is set to innodb_buffer_pool_size/128MB; otherwise, it defaults to 1.

In other operating systems, if the size of innodb_buffer_pool_size exceeds 1GB, the default value of innodb_buffer_pool_instances is set to 8; otherwise, it defaults to 1.

To achieve optimal efficiency, it is recommended to specify the size of innodb_buffer_pool_instances and ensure that each buffer pool instance has at least 1GB of memory. Generally, the size of innodb_buffer_pool_instances is advised not to exceed the sum of innodb_read_io_threads and innodb_write_io_threads, and it is recommended to have a ratio of 1 instance to 1 thread.

  • innodb_read_io_threads / innodb_write_io_threads

By default, MySQL background threads include the main thread, IO threads, lock thread, monitoring thread, etc. The read and write threads belong to the IO threads and are mainly responsible for reading and writing operations in the database. These threads read from and write to the various memory pages created by innodb_buffer_pool_instances. MySQL supports configuring multiple read and write threads by setting the values of innodb_read_io_threads and innodb_write_io_threads.

The default value for the number of read and write threads is 4, which means there are a total of 8 threads running in the background. The values set for innodb_read_io_threads and innodb_write_io_threads in terms of the number of threads depend on the size of innodb_buffer_pool_instances, and the synergy between the two is a key factor in improving system performance.

On some database servers with large amounts of memory and CPU cores, we can increase the number of cache instances and the number of read and write threads in coordination while ensuring a sufficiently large InnoDB buffer pool memory. Here is the formula to determine the number of cache instances and read/write threads:

( innodb_read_io_threads + innodb_write_io_threads ) = innodb_buffer_pool_instances

It is not reasonable to evenly split the number of read and write threads according to the number of cache instances. For example, if our application service reads data from the database more than it writes data, increasing the number of write threads would not optimize performance. Generally, we can determine the read-to-write ratio of the system through the global statistics information saved by the MySQL server.

We can determine the read-to-write ratio through the following queries:

SHOW GLOBAL STATUS LIKE 'Com_select'; // Number of reads
SHOW GLOBAL STATUS WHERE Variable_name IN ('Com_insert', 'Com_update', 'Com_replace', 'Com_delete'); // Number of writes

If reads are greater than writes, we should consider setting the number of read threads larger and the number of write threads smaller; otherwise, vice versa.

  • innodb_log_file_size

In addition to the InnoDB cache and other factors mentioned above, the log cache size, log file size, and the strategy of persisting log files to disk also affect the performance of InnoDB. In InnoDB, there is a redo log file which is used to store the redo activities of every write request processed by the server. Each write query executed obtains a redo entry in the log file so that changes can be recovered in the event of a crash.

When the log file size exceeds the size set by our parameter configuration, InnoDB will automatically switch to another log file. As the redo log is a circularly used ring, when switching, the dirty page cache data of the new log file needs to be flushed to disk (triggering a checkpoint).

In theory, the larger the innodb_log_file_size is set, the fewer checkpoint flush activities are required in the buffer pool, thereby saving disk I/O. So, is it better to set this log file size larger? If the log file size is set too large, the recovery time will be longer, which is not convenient for DBA management. In most cases, setting the log file size to 1GB is sufficient.

  • innodb_log_buffer_size

This parameter determines the size of the InnoDB redo log buffer pool, with a default value of 8MB. If there are a large number of transactions in high concurrency, a small value will increase the I/O operations for writing to disk. We can increase this parameter to reduce writing to disk and improve transaction performance during concurrency.

  • innodb_flush_log_at_trx_commit

This parameter controls the strategy of flushing the redo log from the cache to the file and then to disk. The default value is 1.

When this parameter is set to 0, InnoDB triggers a cache log write and disk flush operation every second. This can potentially result in the loss of 1 second of data after a database crash.

When set to 1, each transaction’s redo log is directly persisted to disk, ensuring that data is not lost after a MySQL restart.

When set to 2, the redo log of each transaction is directly written to the log file before being flushed to disk.

In scenarios where data integrity is highly valued, it is clearly necessary to set this value to 1. In scenarios where it is acceptable to lose 1 second of data after a database crash, we can set this value to 0 or 2, which significantly reduces the I/O operations for logging to disk.

Summary #

There are many parameter settings for MySQL databases. Today, we only covered the parameter settings related to memory optimization. In addition to these parameter settings, there are also some commonly used parameters to improve MySQL concurrency, summarized as follows:

img

Thought Question #

We know that the memory size of InnoDB’s InnoDB Buffer Pool (IBP) is limited. Do you know how InnoDB keeps hot data in memory and evicts non-hot data?