44 Answering Questions Part Three - Discussing These Good Questions #

This is the last Q&A article in our column, and today we’re going to talk about some good questions.

In my opinion, a good question is one that helps us expand the boundaries of a logic. By solving such questions, we can deepen our understanding of this logic or connect it to another knowledge point, which can help us build our own knowledge network.

Being able to ask good questions is an important skill in our work.

After this period of learning, I can feel from the questions in the comment section that students who follow the course have a better understanding of the performance of SQL statements. The questions they raise are also becoming more detailed and precise.

Next, let’s take a look at some of the good questions mentioned by students in the comment section. When analyzing these questions with you, I will point out which article they appeared in. At the same time, when answering these questions, I will assume that you have mastered the knowledge covered in these articles. Of course, if your memory is fuzzy, you can go back to the article and review it again.

Writing join statements #

In the 35th article [“How to Optimize JOIN Statements”], I used straight_join when discussing the execution order of joins. Student @郭健 asked two questions at the end of the article:

If we use a left join, is the left table always the driving table?
If a join between two tables involves multiple conditions for equality matching, should all the conditions be written in the on clause, or should only one condition be written in the on clause and the rest be written in the where clause?

To answer these two questions at the same time, let me create two tables, a and b:

create table a(f1 int, f2 int, index(f1))engine=innodb;
create table b(f1 int, f2 int)engine=innodb;
insert into a values(1,1),(2,2),(3,3),(4,4),(5,5),(6,6);
insert into b values(3,3),(4,4),(5,5),(6,6),(7,7),(8,8);

Both table a and b have two columns, f1 and f2. The difference is that table a has an index on column f1. Then, I insert 6 records into both tables, with 4 rows of data existing in both table a and b at the same time.

The second question mentioned by student @郭健 is actually the difference between the following two ways of writing:

select * from a left join b on(a.f1=b.f1) and (a.f2=b.f2); /*Q1*/
select * from a left join b on(a.f1=b.f1) where (a.f2=b.f2); /*Q2*/

I will refer to these two statements as Q1 and Q2, respectively.

Firstly, it should be noted that these two left join statements have different semantic logic. Let’s first take a look at their execution results.

Figure 1: Query results for the two joins

As you can see:

Statement Q1 returns a dataset of 6 rows. Even if there are no records in table a that satisfy the matching conditions, one row will still be returned in the query results, with all the fields of table b filled in with NULL values.
Statement Q2 returns 4 rows. From a logical perspective, we can understand it like this: for the last two rows, because there are no matching fields in table b, the value of b.f2 in the result set is empty, which does not satisfy the condition in the where clause and therefore cannot be part of the result set.

Next, let’s see how MySQL actually executes these two statements.

Let’s first look at the explain result for statement Q1:

Figure 2: Explain result for Q1

As you can see, this result meets our expectations:

Table a is the driving table, and table b is the driven table.
Since the f1 field in table b does not have an index, the Block Nested Loop Join (BNL) algorithm is used.

When you see the BNL algorithm, you should know that the execution flow of this statement is actually as follows:

Read the contents of table a into the join_buffer. Since it’s a select *, both fields f1 and f2 are put into the join_buffer.
Sequentially scan table b. For each row of data, check if the join condition (a.f1=b.f1 and a.f2=b.f2) is satisfied. If the condition is satisfied, return the matching record as one row in the result set. If the statement has a where clause, the where condition needs to be checked first before returning.
After scanning table b, for the unmatched rows of table a (in this example, it’s the two rows (1,1) and (2,2)), fill the remaining fields with NULL and put them into the result set. The corresponding flowchart is as follows:

Figure 3: Left join - BNL algorithm

As you can see, this query statement is indeed driving table a, and from the execution effect, it is the same as using straight_join.

You might wonder if the query result of statement Q2 is missing the last two rows of data, is it because step 3 in the above process is removed? Let’s take a look at the explain result of statement Q2 first.

Figure 4: Explain result of Q2

Here’s a side note: the column will be ending soon, and I have already “mentally simulated” the execution process of a statement many times with you based on the explain result, so I hope you have developed this skill. Today, let’s analyze the explain result of an SQL statement together again.

As you can see, this query statement is driving table b. And if the Extra field of a join statement is not written, it means that the Index Nested-Loop Join (NLJ) algorithm is used.

Therefore, the execution process of statement Q2 is as follows: sequentially scan table b, for each row, use b.f1 to search in table a, and after matching records, check if a.f2=b.f2 satisfies the condition. If it does, it will be returned as part of the result set.

So, why is there such a big difference in the execution process between query Q1 and Q2? Actually, this is because the optimizer has optimized it based on the semantics of query Q2.

To understand this issue, I need to explain another background knowledge to you: in MySQL, performing equality and inequality comparisons with NULL and any value will result in NULL. This includes the result of select NULL = NULL, which also returns NULL.

Therefore, the where a.f2=b.f2 in statement Q2 means that the resulting rows will not include rows where b.f2 is NULL. So, the semantics of this left join are “find the rows in both tables where f1 and f2 correspond to each other. For rows in table a that have no match in table b, abandon them”.

In this way, although the statement uses left join, the semantics are consistent with join.

Therefore, the optimizer rewrote this statement as join, and because there is an index on f1 of table a, it treated table b as the driving table, so it could use the NLJ algorithm. After executing explain, if you execute show warnings, you can see the result of this rewrite, as shown in Figure 5.

Figure 5: Rewrite result of Q2

This example illustrates that even when we write left join in the SQL statement, the execution process may not necessarily be a left-to-right join. That is to say, when using left join, the left table is not necessarily the driving table.

In this case, if you need the semantics of left join, you should not include conditions for equal or not equal comparisons with fields from the driven table in the where clause, and instead, write them all in the on clause. But what about join statements?

Now, let’s look at these two statements again:

select * from a join b on(a.f1=b.f1) and (a.f2=b.f2); /*Q3*/
select * from a join b on(a.f1=b.f1) where (a.f2=b.f2);/*Q4*/

Let’s use the explain and show warnings method again to see what the optimizer does.

Figure 6: Join statement rewrite

As you can see, both of these statements are rewritten as:

select * from a join b where (a.f1=b.f1) and (a.f2=b.f2);

The execution plan is naturally the same.

That is to say, in this case, there is no difference whether we put all the join conditions in the on clause for join statements.

Performance Issues of Simple Nested Loop Join #

We know that join statements can have a significant impact on performance depending on the algorithm used. In the comments section of the 34th article, “Can we use join or not?”, @书策稠浊 and @朝夕心 asked a very good question.

In the article, we mentioned that although both BNL algorithm and Simple Nested Loop Join algorithm require M*N comparisons (M and N are the number of rows in the two tables being joined), the Simple Nested Loop Join algorithm needs to perform a full table scan in each round of comparison, so the BNL algorithm will be much faster in execution.

To facilitate the explanation, let me briefly describe these two algorithms for you.

The execution logic of the BNL algorithm is:

First, read all the data from the driving table into the join_buffer in memory. Here, the join_buffer is an unordered array;
Then, sequentially traverse all rows of the driven table, and match each row with the data in the join_buffer. If a match is found, it is returned as part of the result set.

The execution logic of the Simple Nested Loop Join algorithm is as follows: sequentially retrieve each row of data from the driving table and perform a full table scan match on the driven table. If a match is found, it is returned as part of the result set.

The question raised by these two students is that even though the Simple Nested Loop Join algorithm also reads data into memory and performs matching based on the matching conditions, why is there such a large performance difference?

To explain this issue, we need to refer to the index structure and Buffer Pool related knowledge points in MySQL:

When performing a full table scan on the driven table, if the data is not in the Buffer Pool, it will need to wait for this part of the data to be read from the disk. Reading data from disk to memory will affect the normal business Buffer Pool hit rate, and this algorithm naturally performs multiple accesses to the data of the driven table, making it easier to place these data pages at the beginning of the Buffer Pool (please refer to the relevant content in the 35th article);
Even if the data of the driven table is all in memory, the “fetching the next record” operation is similar to a pointer operation. Whereas the join_buffer is an array, so the cost of traversing it is lower.

Therefore, the performance of the BNL algorithm will be better.

Performance of distinct and group by #

In the 37th article, Comrade Lao Yang raised a good question: If we only need to remove duplicates and do not need to perform aggregate functions, which one is more efficient, distinct or group by?

Let me elaborate on his question: If there is no index on field a of table t, then will the following two statements have the same performance?

select a from t group by a order by null;
select distinct a from t;

First, it needs to be clarified that this group by syntax is not the standard syntax of SQL. The standard group by statement requires adding an aggregate function in the select section, such as:

select a, count(*) from t group by a order by null;

The logic of this statement is: group by field a and calculate the number of occurrences of each group’s value of a. In the result, since it is an aggregate calculation, the same a appears only once.

Without count(*), that is, when the logic of the first statement no longer needs to perform “calculate the total number”, it becomes: group by field a and only return one row for the same value of a. This is the semantics of distinct, so when there is no need to perform aggregate functions, the semantics and execution process of distinct and group by are the same, and therefore the execution performance is the same.

The execution process of these two statements is as follows:

Create a temporary table with a field a and create a unique index on this field a.
Traverse table t and insert the data into the temporary table one by one:
- If a unique key conflict is found, skip it.
- Otherwise, the insertion is successful.
After traversal is completed, return the temporary table as the result set to the client.

Replication master and auto-increment primary key #

In addition to performance issues, everyone’s detailed questions are also very on point. In the comments section of the 39th article, Comrade Hat Dropped asked: In the binlog_format=statement mode, if statement A first gets id=1 and then statement B gets id=2; then statement B is committed, writing to the binlog, and then statement A writes to the binlog. At this time, when replaying the binlog, will it cause the inconsistency between id=1 of statement B and id=2 of statement A?

First, this question assumes that “the order of generating auto-increment ids may be different from the order of writing binlogs”, and this understanding is correct.

Second, this question is limited to the statement format, which is also correct. This is because in the row format binlog, there is no such problem. The Write row event directly writes the values of each field of each row.

As for why inconsistency does not occur, let’s take a look at the example below.

create table t(id int auto_increment primary key);
insert into t values(null);

Figure 7 binlog of the insert statement

As can be seen from the binlog before the insert statement, there is a SET INSERT_ID=1 statement. This command means that in this thread, the next time it needs to use an auto-increment value, regardless of the current table’s auto-increment value, it will always use the value 1.

This SET INSERT_ID statement is always placed before the insert statement. For example, in the scenario mentioned by Comrade Hat Dropped, the id of statement A on the master is 1, and the id of statement B is 2, but the binlog is written in the order of B first and then A. In that case, the binlog becomes: SET INSERT_ID=2; Statement B; SET INSERT_ID=1; Statement A;

You see, the INSERT_ID used in Statement B on the backup database remains 2, the same as the master database.

Therefore, even if the execution order of the two INSERT statements is different on the master and backup databases, the values of the auto-increment primary key field will not be inconsistent.

# Summary

In today's Q&A article, I have selected 4 good questions to share with you and provided analysis. In my opinion, being able to ask good questions first means that these students understand the content of our article and then think deeply about it. It is an encouragement and motivation for me to see that you are reading and thinking seriously.

To be honest, these three short Q&A articles are not enough to fully address the high-quality questions left by students in the comment section. Some students will have a second read, and new students will join. If you have new questions, please leave me a comment. I will continue to pay attention to the comment section and communicate with you there.

As usual, Q&A articles also need post-reading questions.

In the comment section of the [8th article], @XD mentioned a question: he checked innodb_trx and found that the trx_id of this transaction is a very large number (281479535353408), and it seems that the trx_id obtained by starting the session in the same session does not change. When any write-locked statement is executed, trx_id becomes a very small number (118378).

You can verify it through experiments and analyze the allocation rules of transaction IDs and why MySQL designed it this way.

You can write your conclusions and analysis in the comment section, and I will discuss this question with you in the next article. Thank you for listening, and please feel free to share this article with more friends to read together.

# Previous Question Time

The previous question was about how to create an auto-increment primary key for partitioned table t. Since MySQL requires the primary key to include all partition fields, it is certain that a composite primary key needs to be created.

There are two options: one is (ftime, id), and the other is (id, ftime).

From the perspective of utilization, (ftime, id) should be used. Using ftime as the partition key indicates that most statements involve ftime. By using this mode, the rules of prefix index can be utilized to reduce one index.

The table creation statement is as follows:

```sql
CREATE TABLE `t` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `ftime` datetime NOT NULL,
  `c` int(11) DEFAULT NULL,
  PRIMARY KEY (`ftime`,`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
PARTITION BY RANGE (YEAR(ftime))
(
  PARTITION p_2017 VALUES LESS THAN (2017) ENGINE=MyISAM,
  PARTITION p_2018 VALUES LESS THAN (2018) ENGINE=MyISAM,
  PARTITION p_2019 VALUES LESS THAN (2019) ENGINE=MyISAM,
  PARTITION p_others VALUES LESS THAN MAXVALUE ENGINE=MyISAM
);
```

Of course, my suggestion is to use the InnoDB engine as much as possible. InnoDB tables require at least one index, with the auto-increment field as the first field, so an individual index for id needs to be added.

```sql
CREATE TABLE `t` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `ftime` datetime NOT NULL,
  `c` int(11) DEFAULT NULL,
  PRIMARY KEY (`ftime`,`id`),
  KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY RANGE (YEAR(ftime))
(
  PARTITION p_2017 VALUES LESS THAN (2017) ENGINE=InnoDB,
  PARTITION p_2018 VALUES LESS THAN (2018) ENGINE=InnoDB,
  PARTITION p_2019 VALUES LESS THAN (2019) ENGINE=InnoDB,
  PARTITION p_others VALUES LESS THAN MAXVALUE ENGINE=InnoDB
);
```

Of course, reversing the fields and creating them as follows:

```sql
PRIMARY KEY (`id`,`ftime`),
KEY `id` (`ftime`)
```

is also possible.