08 the Art of Index Sorting

08 The Art of Index Sorting #

In Module 1, we learned how to create a table with the appropriate types correctly. However, the created table cannot be used immediately in a real business system. This is because table structure design is only the initial step in database design, and we still lack the most important step in database design - index design. Only with correct index design can the business meet the preliminary standards for going live.

Therefore, in Module 2, I will talk about index design, business applications, and optimization cases. Today, let’s first learn about the most fundamental concept of relational databases - indexes, and have a preliminary understanding of indexes in databases, and effectively use B+ tree indexes.

What is an index? #

I believe that in interviews, you are often asked “What is an index?” and you must be able to answer: An index is a data structure that improves query speed.

The reason why an index can improve query speed is that it sorts the data when inserting (obviously, its drawback is that it affects the performance of insertion or update).

Therefore, the index is an art of sorting, and designing and creating indexes effectively will improve the overall performance of the database system. In the current version of MySQL 8.0, InnoDB storage engine supports B+ tree indexes, full-text indexes, R-tree indexes. In this lecture, we will focus on the most widely used B+ tree indexes.

B+ Tree Index Structure #

B+ tree index is the most common index data structure in database systems. Almost all relational databases support it.

So why are relational databases enthusiastic about supporting B+ tree indexes? Because it is the most efficient data structure for sorting so far. Binary trees, hash indexes, red-black trees, SkipLists are far less efficient than B+ tree indexes in terms of sorting and storing massive data based on disk storage efficiency.

Therefore, the above data structures are generally only used for in-memory objects, and for sorting and storing disk-based data, B+ tree indexes are still the most efficient.

The characteristics of B+ tree indexes are: disk-based balanced tree, but the tree is very short, usually 3-4 levels, capable of storing tens of millions to hundreds of millions of sorted data. The short tree means high access efficiency. When querying one piece of data from tens of millions or hundreds of millions of data, only 3-4 I/O operations are needed.

Since modern solid-state drives can perform at least 10,000 I/O operations per second, even if the data is all on the disk, it only takes 0.003~0.004 seconds to query one piece of data. In addition, because the B+ tree is short, for sorting, it only needs to compare 3-4 times to locate the position where the data needs to be inserted. The sorting efficiency is very good.

A B+ tree index consists of a root node, non-leaf nodes, and leaf nodes. The leaf nodes store all the sorted data. Of course, there is also a special case, such as a B+ tree index with a height of 1:

Drawing 1.png

In the above diagram, the first column is the column for the B+ tree index sorting. You can understand it as the column “id” in the table “User”, which is of type BIGINT with 8 bytes. Therefore, the column “userId” is the index key, similar to the following table:

CREATE TABLE User (

  id BIGINT AUTO_INCREMENT PRIMARY KEY,

  name VARCHAR(128) NOT NULL,

  sex CHAR(6) NOT NULL,

  registerDate DATETIME NOT NULL,

  ...

)

All B+ trees start with a height of 1, and then the tree grows taller as more data is inserted. You must bear in mind that indexes sort the records. In a B+ tree index with a height of 1, the records stored have already been sorted. If you want to query within a leaf node, you only need to perform a binary search to quickly locate the data.

As more records are inserted into the B+ tree index, one page (16K) cannot hold so many data, so the B+ tree splits, and the height of the B+ tree becomes 2. When the height of the B+ tree is greater than or equal to 2, the root node and the non-leaf nodes store index key-value pairs, consisting of (index key, pointer).

The index key is the sorting column, and the pointer points to the next lower-level address, occupying 6 bytes in the MySQL InnoDB storage engine. The following diagram shows what a B+ tree index looks like when the height is 2:

Drawing 3.png

As shown in the B+ tree index above, if you want to query a record with an index key value of 5, first look up the root node and find the key-value pair (20, address). This indicates that the records less than 20 are stored in the leaf node pointed to by the address. Then, based on the next-level address, you can find the leftmost leaf node, and then use binary search within the leaf node to find the record with an index key value of 5.

So how many rows can a B+ tree index with a height of 2 theoretically store?

In the MySQL InnoDB storage engine, the size of one page is 16K. In the User table mentioned above, the key userId is of type BIGINT, so:

The maximum number of key-value pairs that can be stored in the root node = 16K / size of key-value pair (8+6) ≈ 1100

Assuming that the size of each record in the User table is 500 bytes, then:

The maximum number of records that can be stored in the leaf node = 16K / size of each record ≈ 32

To summarize, a B+ tree index with a height of 2 can store a maximum of:

    Total number of records = 1100 * 32 = 35,200
    
    In other words, after sorting 35,200 records, the B+ tree index has a height of 2. When searching for a record based on the index key among these 35,200 records, only 2 pages need to be queried - one root leaf and one leaf node - to locate the page where the record is located.
    
    A B+ tree index with a height of 3 is essentially the same as an index with a height of 2, as shown in the following diagram:
    
    ![Drawing 5.png](../images/CioPOWCk3RKAPxMHAAF2YtY7XzI340.png)
    
    Similarly, the maximum number of records that a B+ tree index with a height of 3 can store is:
    
    Total number of records = 1100 (root node) * 1100 (intermediate nodes) * 32 = 38,720,000
    
    Up to this point, you may notice that a B+ tree index with a height of 3 can actually store 38 million records. When locating a record among these 38 million records, only 3 pages need to be queried. **So, are the advantages of the B+ tree index gradually being demonstrated**?
    
    However, in a real environment, the utilization rate of each page is not so high, and there may be some fragmentation. Let's assume that the utilization rate of each page is 60%, then:
    
    ![Drawing 6.png](../images/Cgp9HWCk3RqAWic4AADLJpJ9heo761.png)
    
    The table demonstrates the power of the B+ tree, which is that when searching for records based on the index key value among over 5 billion data, only 4 I/O operations are required, taking approximately 0.004 seconds. If these queried pages are already cached in the memory buffer pool, the query performance will be even faster.
    
    In the database, the above index query request corresponds to the SQL statement:
    
    SELECT * FROM User WHERE id = ?
    
    Users can use the EXPLAIN command to see whether the index is used:
    
    mysql> EXPLAIN SELECT * FROM  User WHERE id = 1\G

    ********************** 1. row **********************
    
               id: 1
    
      select_type: SIMPLE
    
            table: User
    
       partitions: NULL
    
             type: const
    
    possible_keys: PRIMARY
    
              key: PRIMARY
    
          key_len: 8
    
              ref: const
    
             rows: 1
    
         filtered: 100.00
    
            Extra: NULL
    
    In the EXPLAIN results, the key column shows PRIMARY, which means the query is based on the primary key index. If the query is not based on an index, such as searching by gender, it would display something like:
    
    mysql> EXPLAIN SELECT * FROM User WHERE sex = 'male'\G
    
    ********************** 1. row **********************
    
               id: 1
    
      select_type: SIMPLE
    
            table: User
partitions: NULL

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 986400

filtered: 50.00

Extra: Using where

At this point, you should understand the organization of the B+ tree index and why it can quickly locate query records in billions of data. However, the efficiency of B+ tree query comes at a cost, which is the insertion performance issue we mentioned earlier. Let’s discuss this next.

Optimizing the Insertion Performance of B+ Tree Index #

When inserting into a B+ tree, the data needs to be sorted, but the cost of sorting is not as high as you might think because sorting is a CPU operation (a CPU can process billions of instructions in one clock cycle).

The real cost lies in maintaining the B+ tree index to ensure data sorting. There are two different cases of data insertion.

  • Inserting in sequential (or reverse) order: The maintenance cost of the B+ tree index is very low. The leaf nodes are inserted from left to right. A typical example is the insertion of auto-incremented ID or time (if an index is created on the auto-incremented ID or the time column, the B+ tree insertion is usually fast).
  • Inserting in random order: To maintain the sorting of the B+ tree, significant operations such as page splitting and rotation are required. In addition, even for solid state drives, random writes do not perform as well as sequential writes, so disk performance is greatly affected. A typical example is a user nickname. When each user registers, the nickname is randomly chosen. If an index is created on the nickname, the insertion is unordered, and the maintenance cost of the index will be relatively high.

You cannot expect all inserted data to be sorted because indices are used for data sorting. If the inserted data is already sorted, you don’t need to use the B+ tree index for data querying.

Therefore, in the design of the B+ tree index in MySQL database, only the index for the primary key is required to be ordered, such as using auto-increment or UUID sorted by the UUID_TO_BIN function, instead of using unordered values as the primary key.

Let’s review the performance comparison of auto-increment, UUID, and sorted UUID insertion discussed in Lesson 05:

Drawing 8.png

As you can see, UUID performs noticeably worse than auto-increment and sorted UUID in terms of insertion performance due to its unordered nature.

Therefore, I want to emphasize again: In table structure design, it is crucial to use sequential values as much as possible for the primary key design in order to ensure performance in scenarios of massive concurrent business operations.

The above is the knowledge about index querying and insertion. Next, let’s analyze how to view B+ tree indexes in the MySQL database.

Design and Management of B+ Tree Indexes in MySQL #

In the MySQL database, you can use the query SELECT on the table mysql.innodb_index_stats to view the approximate information of each index:

SELECT 
    table_name,index_name,stat_name,
    stat_value,stat_description 
FROM innodb_index_stats 
WHERE table_name = 'orders' and index_name = 'PRIMARY';

+----------+------------+-----------+------------+------------------+
|table_name| index_name | stat_name | stat_value |stat_description  |
+----------+-------------------+------------+------------+----------+
| orders | PRIMARY|n_diff_pfx01|5778522     | O_ORDERKEY            |
| orders | PRIMARY|n_leaf_pages|48867 | Number of leaf pages        |
| orders | PRIMARY|size        |49024 | Number of pages in the index|
+--------+--------+------------+------+-----------------------------+

3 rows in set (0.00 sec)

From the above result, it can be seen that the primary key index in the orders table has approximately 5,778,522 records, with a total of 48,867 leaf nodes and 49,024 index pages. Based on the previous information, we can deduce that the number of non-leaf nodes is between 48,867 and 49,024, which is equal to 157 pages.

Furthermore, I’ve seen some so-called “rules” online about MySQL that say “a table should not have more than 5 indexes”. There is absolutely no such statement, it is completely unfounded.

In my opinion, if the business indeed requires queries based on multiple dimensions, then corresponding multiple indexes should be created. There is no room for discussion on this matter.

The real problem encountered in practice is: due to the unfamiliarity of business developers with databases, they create numerous indexes that have never been used since their creation! Because the optimizer does not choose these inefficient indexes, they occupy space and impact insert performance.

So how do you know which B+ tree indexes have not been used? In MySQL, you can query the table sys.schema_unused_indexes to see which indexes have never been used and can be discontinued:

SELECT * FROM schema_unused_indexes

WHERE object_schema != 'performance_schema';

+---------------+-------------+--------------+

| object_schema | object_name | index_name   |

+---------------+-------------+--------------+

| sbtest        | sbtest1     | k_1          |

| sbtest        | sbtest2     | k_2          |

| sbtest        | sbtest3     | k_3          |

| sbtest        | sbtest4     | k_4          |

| tpch          | customer    | CUSTOMER_FK1 |

| tpch          | lineitem    | LINEITEM_FK2 |

| tpch          | nation      | NATION_FK1   |

| tpch          | orders      | ORDERS_FK1   |

| tpch          | partsupp    | PARTSUPP_FK1 |

| tpch          | supplier    | SUPPLIER_FK1 |

+---------------+-------------+--------------+

If the database has been running for a long time and the indexes have been created for a long time but still appear in the result above, the DBA can consider deleting these unused indexes.

MySQL 8.0 introduces the “invisible” feature for indexes. Before deleting deprecated indexes, users can make indexes invisible to the optimizer and observe if there are any business impacts. If there are none, the DBA can safely delete these indexes:

ALTER TABLE t1 

ALTER INDEX idx_name INVISIBLE/VISIBLE;

Summary #

In this lecture, I have provided a preliminary overview of indexes. After studying this lecture, I believe you will have a clear understanding of the following:

  • An index is a data structure that speeds up queries, based on the principle of sorting data during insertion. The drawback is that it impacts the performance of insertions.
  • MySQL currently supports B+ tree indexes, full-text indexes, and R-tree indexes.
  • The height of a B+ tree index is usually 3-4 layers, and a B+ tree with a height of 4 can hold around 5 billion data records.
  • Due to the low height of B+ trees, query efficiency is extremely high. Even for 5 billion data records, only 4 I/O operations are needed for a query.
  • MySQL does not have a specific limit on the number of indexes per table. If a business query requires it, indexes should be created. Do not blindly believe in the limitation of the number of indexes.
  • Unused indexes can be deleted by using the sys.schema_unused_indexes table and the invisible index feature.

In conclusion, even though indexes are a well-known topic, they are the core of all relational databases. I hope you will reread this article multiple times to truly understand the implementation of B+ tree indexes.