My MySQL Journey #

After the launch of the column on November 21st, I did a live broadcast on Geek Time with the theme “My Journey with MySQL”. Today, I have purposely placed the recap of this live broadcast below the column, hoping that you can find some helpful insights from my years of experience with MySQL.

Firstly, let me share with you the main points I covered in this live broadcast:

My experience with MySQL.
Why you should understand database principles.
The learning path I recommend for MySQL.
The path to becoming a DBA.

My Experience #

Entering Baidu with Enriching Experiences #

I graduated from Fuzhou University, and as far as I know, it was difficult for our school’s graduates to directly enter Baidu. We had to pass the entrance exam and pursue a master’s degree at Zhejiang University. However, unexpectedly, I managed to secure an interview after submitting my resume.

After joining the company, I went to ask the interviewer why my resume was selected. They said, “Because your resume is very impressive.” During my time in school, I worked on numerous projects and interned at several companies, so my resume showcased a wealth of experience.

During the interview, there was one question that left a deep impression on me. The interviewer asked, “With so many internship experiences, have you encountered anything particularly interesting?” After thinking for a while, I answered, “Let me tell you about a big data-related experience. When I was doing log analysis with a mobile company, I dealt with tens of millions of lines of data.” Hearing this, he laughed.

Later, after joining Baidu, I realized that dealing with tens of millions of lines of data was considered small-scale.

Trying to Solve Problems by Reading Source Code #

After joining Baidu, I worked on the back-end programming of Tieba, such as the permission system and so on. It was actually quite simple. I just needed to write a C language program to handle client requests and return results.

At that time, I was just an ordinary user of MySQL. After using it for some time, I encountered a problem: occasionally, a request that used to run very fast would become very slow. My boss asked me what the reason was, and I was embarrassed to say that I didn’t know, so I started searching for information online.

However, back in 2008, there was very little online information available. I spent a long time searching but couldn’t find a clear explanation. Finally, I had no choice but to look at the source code. When I started reading the source code, I found it quite fascinating. Moreover, the source code really helped me solve some problems.

From then on, I couldn’t stop myself. I fell into the “pit” of reading source code.

In 2010, Alibaba was recruiting database developers. Although I only understood source code at that time and had no development experience, I still submitted my resume with a try-it-out attitude. Then I successfully passed the interview and joined Alibaba. After that, I worked with Chu Ba for more than 7 years before leaving Alibaba.

During my time at Baidu, I hardly participated in any community activities. Because at that time, Baidu probably emphasized internal sharing, and the experience of solving problems was mainly shared within the intranet. So, after joining Alibaba, I established a blog and opened a Weibo account. My nickname at Alibaba is Ding Qi, and I use this name for my blog, Weibo, and community as well.

Why Understand Database Principles? #

Here, I will share a few personal experiences and discuss why it is important to understand database principles.

Understanding the principle can help you better pinpoint problems #

During a classmate gathering, we discussed technical issues. One classmate who works in the government mentioned that their system was strange - they had to restart the application every morning, otherwise it would prompt a database connection failure, and they had no idea what to do.

I analyzed the situation and suggested that based on this error message, it was likely that the connection time was too long and got disconnected. The default timeout for the database was 8 hours, and since they usually finish work at 6 o’clock and no one uses the system after work, it would be until the next morning at 9 or even 10 o’clock when they start working again. That’s already more than 10 hours, so the database connection would definitely be disconnected.

At that moment, I guessed that the system program was poorly written and couldn’t reconnect after a connection failure, so it still used the broken connection, resulting in an error. Then, I suggested that they should go back and increase the timeout period. Later, he told me that the problem was solved using this method.

From this experience, I deeply realized that as developers, even if we only understand the meaning of each parameter, we may be able to provide correct solutions to some problems.

Understanding Principles Can Help You Solve Problems More Ingeniously #

When I was working on a Tieba system, I had to request permissions every time a page was accessed. However, the probability of this permission request being accessed was very high, and it was not possible to query the database every time. So what should I do?

I came up with a simple solution: I created a large memory space within the application and loaded the entire table into memory at startup. This way, when there is a permission request, I can directly retrieve it from memory.

When the database restarts, my process also restarts. Then, the process will scan the entire table and store all the user-related information in memory.

However, later I encountered a frustrating situation. Sometimes when MySQL crashed, my program would reload the permissions into memory, and this select statement would take about 30 minutes to execute. Normally, restarting MySQL is fast, and the process restarts quickly. The normal process of loading permissions only takes two minutes to complete. But why does it take 30 minutes when there is an abnormal restart?

I had no choice but to look at the source code. Then, I discovered that MySQL has a mechanism that tries to flush dirty pages when it considers the system idle.

In our specific example, after MySQL restarts, my process performs a full table scan. However, at this time, the permission data has not finished initializing, so my server layer cannot provide services. As a result, there is only one select request for a full table scan in MySQL, and MySQL thinks it’s idle, so it starts desperately flushing dirty pages, consuming a large amount of disk resources, which slows down my full table scan.

After understanding this mechanism, I wrote a script that sends a request every 0.5 seconds to execute a simple SQL query, telling the database that I am actually busy and to slow down the flushing of dirty pages.

Once the script was deployed, the flushing of dirty pages did slow down, and the scan for loading permissions became faster. It is said that even after I left the company for more than two years, this script is still in use.

You see, if we understand some parameters and can comprehend these parameters, we can make the correct settings. And if we further understand some principles, we can solve problems more ingeniously.

Understanding the source code gives you more options #

In 2012, there was a lot of pressure on Alibaba’s Double Eleven business. At that time, we didn’t have as many SSDs as we do now; it was the era of mechanical hard disks.

To cope with the pressure, we started using SSDs, but we didn’t dare to use them directly as storage. Instead, we used them as secondary caches. We used an open-source system called Flashcache (which is now considered outdated; I don’t know if you’ve heard of it).

The implementation of Flashcache treats the SSD as a secondary cache for physical disks, which can improve performance. However, when we deployed it ourselves, we found that the performance improvement was not as good as we expected, and it was even worse than using pure mechanical disks.

So, my colleague and I started researching. My colleague analyzed the source code of Flashcache, while I analyzed the source code of MySQL. We discovered that Flashcache had a dirty page ratio, and when the dirty page ratio reached 80%, it would stop and forcefully flush the dirty pages to the disk.

Initially, we thought that the dirty page ratio was 20% of the total pages. But after studying the source code, we found out that there were multiple buckets, for example, each bucket represented 20MB. When a bucket reached 80% usage, Flashcache considered it to be full of dirty pages and started flushing. This meant that if you were performing sequential writes, it was easy to fill up a bucket.

After understanding this principle, I moved all the sequentially written data to mechanical hard disks and placed the randomly written data on Flashcache. After making this modification, the performance improved.

You see, if you can understand the source code, your actions will be different.

Learning Path for MySQL #

Speaking of the learning path for MySQL, the content I shared above can all be summarized as part of the learning path.

First of all, you need to know how to use it and understand the meaning of each parameter. This will affect your operational behavior. Don’t just take advice from the internet and blindly follow how others use it without understanding why. After that, you need to understand the implementation principle of each parameter. Once you understand these principles, your actions will be different. Further, if you can understand the source code, your understanding of the database will also be different.

Let me also talk about how I guide new graduates. Practice is a good way to learn, so I let new hires set up a primary-secondary database configuration. Through this process, I discovered that each person’s ability to self-learn is different. For example, when encountering latency issues or intentionally creating scenarios where primary and secondary databases are inconsistent, I let new hires understand how to analyze and resolve issues.

If I have to summarize a learning path, it would be to first know how to use it, and then identify problems.

In my columns, I always present a common question at the end of each article for readers to contemplate. These questions are not too difficult, they are related to the column article and are encountered frequently, but the answers cannot be directly obtained from the article.

My suggestion is that you try to think about the question first without looking at the answer, or search within the database. This will be a good process.

The next step is practice. After you have some concepts, you can then refer to the MySQL official manual. Some people have asked me whether they should directly read the manual?

My suggestion is not to rush to read the manual at the beginning. It contains over 100,000 English words, and even if you’re talented, you’ll forget what you read earlier. Therefore, you must first have a framework and a knowledge network, and then use the manual to fill in the gaps and reinforce your understanding.

That’s how I went through the journey myself.

In addition, many users in the comments of my column hope that I can recommend a book to accompany the column for studying. If I had to recommend only one book, I suggest you read “High Performance MySQL”. It is a classic book in the field of MySQL and is already in its third edition. You can imagine how popular it is.

Two of the translators of this book, Peng Lixun and Zhai Weixiang, were my former teammates and have extensive experience in MySQL source code development. Their deep understanding of MySQL has allowed this book to maintain the same level of quality as the original English version.

The Journey of a DBA #

What are the similarities between DBAs and developers? #

I have led both development teams and DBA teams, so I can share some insights on the intersections between these two roles.

In fact, DBAs should have some development background, such as experience in developing operational systems. Additionally, the higher the level of automation, the less daily maintenance work there is for DBAs. DBAs need to understand the business logic of development and move towards the role of a business architect.

Similarly, developers cannot rely on DBAs to solve all problems. In every company, there are only a few DBAs. Therefore, developers also need to have a certain understanding of database principles in order to consult DBAs more professionally and efficiently when facing issues.

Therefore, these two roles should be integrated to some extent. Developers should understand database principles, while DBAs should understand the business and development processes.

Does DBA have a future? #

What I want to emphasize here is that every position has a future, you just need to adjust the direction a little according to the changing times.

For example, jokingly, it used to be said that DBAs need to have great physical strength because they have to move servers. Later, the core skills of DBAs became about database setup and high availability switching. However, now even these skills are not enough because there are automation systems available.

Therefore, the future of DBAs is twofold. On one hand, they need to understand the business and become business architects. On the other hand, they need to be forward-thinking and proactively diagnose the systems, identifying issues in each business and creating monthly reports. The business development team can then optimize based on these reports and come to you for consultations when needed. After helping them optimize, you can present the optimized indicators. This will effectively showcase your value to the company.

What are some good habits and methods to improve SQL efficiency? #

The key to improving SQL efficiency is to write SQL statements frequently and develop a sense of the execution efficiency of these statements. When writing or creating indexes in the future, you will have a general idea of the time complexity of the statement, whether it requires a full table scan or index scan, and whether it needs to perform additional lookups, etc.

By doing so, each SQL statement you write will be faster and less likely to contain basic errors. This is also the goal of creating this column.

What skills are needed to read source code? #

To read source code, you need to have a good understanding of C and C++ programming languages. Additionally, you should be familiar with some debugging tools. As code is static, but running it is dynamic, understanding the code is single-threaded while running it is multi-threaded, so you should know how to debug.

Furthermore, I do not recommend using visual tools. Although visual tools are convenient, you won’t know what actually happens when you click a certain operation, so I suggest you manually write code and SQL statements. This way, you will have a better understanding of some underlying principles.

How to learn C and C++? #

During my graduate studies, I made the most progress in learning the C and C++ languages.

At that time, I went to teach C and C++ to undergraduates. I thought I already knew it well enough and could teach it easily. However, when I got there, I realized that knowing how to use something and being able to teach it are completely different things. When preparing for the class, you can’t just focus on the practical aspects; you also need to explain the underlying principles clearly. This forces you to engage in more in-depth and comprehensive learning.

Some people, after reading technical blogs and columns, will outline the article, write down their own questions, and summarize their understanding of the article. This process is very beneficial for learning. Listening and understanding something is one thing, but explaining it to others is a whole different story.

What mindset should you maintain when learning databases? #

Not only databases, but also all multi-threaded services require a tedious debugging and troubleshooting process. It can be quite troublesome when encountering problems. However, the feeling of finding the root cause of a problem can be very satisfying.

I think you need to find this feeling, as it can support you through the long and tedious period ahead, allowing you to persist and continue.