32 Q& a Difference and Relationship Between Blocking Non Blocking Io and Synchronous Asynchronous Io

32 Q&A Difference and Relationship between Blocking-Non-blocking IO and Synchronous-Asynchronous IO #

Hello, I’m Ni Pengfei.

Since the last update, we have completed the third module of the four fundamental modules—the file system and disk I/O section. I’m glad you haven’t fallen behind and are still actively learning, thinking, practicing, and leaving comments for discussion.

Today is the fourth installment of performance optimization. As usual, I have selected some typical questions from the comments of the I/O module to address in today’s Q&A session. For the sake of your learning and understanding, they are not necessarily arranged in the order of the articles.

For each question, I have included a screenshot of the question asked in the comments section. If you need to review the original content, you can scan the QR code at the bottom right of each question.

Question 1: The Difference and Connection Between Blocking/Non-blocking I/O and Synchronous/Asynchronous I/O #

In the working principle of a file system article, I previously introduced the meanings of blocking/non-blocking I/O and synchronous/asynchronous I/O. Let’s briefly review them here.

First, let’s take a look at blocking and non-blocking I/O. Depending on whether the application blocks its own execution, I/O can be classified as blocking I/O or non-blocking I/O.

Blocking I/O refers to an application that blocks the current thread and cannot perform other tasks if it doesn’t receive a response after executing an I/O operation.
Non-blocking I/O means that an application doesn’t block the current thread and can continue executing other tasks after an I/O operation.

Now let’s talk about synchronous I/O and asynchronous I/O. Based on the different notification methods for I/O responses, file I/O can be classified as synchronous I/O or asynchronous I/O.

Synchronous I/O means that after receiving an I/O request, the system doesn’t immediately respond to the application. It waits until the processing is complete and then notifies the application of the I/O result through a system call.
Asynchronous I/O means that after receiving an I/O request, the system notifies the application that the request has been received and then processes it asynchronously. After the processing is complete, the system notifies the application of the result through an event notification.

As you can see, blocking/non-blocking and synchronous/asynchronous are actually two different ways of classifying I/O from different perspectives. They also describe different objects. Blocking/non-blocking refers to the I/O caller (i.e. the application), while synchronous/asynchronous refers to the I/O performer (i.e. the system).

Let me provide an example to further explain. For example, in Linux I/O calls:

The system call “read” is a synchronous read, so it doesn’t respond to the application until it receives the disk data.
On the other hand, “aio_read” is an asynchronous read. The system returns immediately after receiving the AIO read request, and the specific read result is later asynchronously notified to the application through a callback.

Another example is in network socket interfaces:

When using “send()” to directly send data to a socket, if the socket is not set with the O_NONBLOCK flag, the send() operation will block until it completes, and the current thread cannot do other things.
However, if you use epoll, the system will inform you of the status of this socket, allowing you to use a non-blocking approach. When this socket is not writable, you can do other things, such as reading/writing other sockets.

Question 2: Reflection on “File System” #

In the article on the principles of file systems, I left you with a question to think about: when executing the find command, does it cause an increase in system cache? If so, what type of cache increases?

Regarding this question, Baihua and coyang’s answers are already quite accurate. Through studying the principles of the Linux file system, we know that file names and directory relationships are stored in the directory entry cache. This is a memory-based data structure that is dynamically constructed as needed. Therefore, when searching for files, Linux dynamically constructs directory entry structures that are not in the cache, resulting in an increase in the dentry cache.

In fact, in addition to the increase in the directory entry cache, the use of the buffer also increases. If you observe with vmstat, you will find that both the buffer and cache are growing:

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1      0 7563744   6024 225944    0    0  3736     0  574 3249  3  5 89  3  0
 1  0      0 7542792  14736 236856    0    0  8708     0 13494 32335  8 19 66  7  0
 0  1      0 7494452  27280 272284    0    0 12544     0 4550 17084  5 15 68 13  0
 0  1      0 7475084  42380 276320    0    0 15096     0 2541 14253  2  6 78 13  0
 0  1      0 7455728  57600 280436    0    0 15220     0 2025 14518  2  6 70 22  0

Here, the increase in the buffer is because the metadata required to construct the directory entry cache (such as file names and index nodes) needs to be read from the file system.

Question 3: After-class Reflection on “Disk I/O Latency” #

In the case study on disk I/O latency, I left you with a reflection question at the end.

We confirmed the presence of performance bottlenecks in disk I/O using iostat, and we identified a large number of disk I/O processes using pidstat. However, when we traced these processes using strace, we couldn’t find any write system calls. Why is that?

Many students have accurately answered this question in their comments. For example, the comments from “划时代” and “jeff” both pointed out that in this scenario, we need to use the -f option to trace system calls from multiple processes and threads.

As you can see, even the inappropriate selection of options can lead to “mistakes” in performance tools, resulting in seemingly illogical results. I am delighted to see that many students have already grasped the core idea of using performance tools - understanding the principles and issues of the tools themselves.

Question 4: Reflections on “MySQL Case Study” #

In the MySQL Case Study, I left you with a question at the end.

Why is it that even without indexes, the query speed of MySQL improves significantly and the disk I/O bottleneck disappears after the DataService application stops?

Ninuxer’s comment partially explains this question, but it’s not comprehensive enough.

In fact, when you see that DataService is modifying /proc/sys/vm/drop_caches, you should think of the function of cache that we learned earlier.

We know that the data table accessed by the case application is based on the MyISAM engine, and a characteristic of MyISAM is that it only caches indexes in memory, not data. Therefore, when the query statement cannot use an index, the data table needs to be read from the database file into memory before being processed.

So, if you use the vmstat tool to observe the trend of cache and I/O changes, you will find the following result:

$ vmstat 1

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

# Note: DataService is running
 0  1      0 7293416    132 366704    0    0 32516    12   36  546  1  3 49 48  0
 0  1      0 7260772    132 399256    0    0 32640     0   37  463  1  1 49 48  0
 0  1      0 7228088    132 432088    0    0 32640     0   30  477  0  1 49 49  0
 0  0      0 7306560    132 353084    0    0 20572     4   90  574  1  4 69 27  0
 0  2      0 7282300    132 368536    0    0 15468     0   32  304  0  0 79 20  0

# Note: DataService is stopped here
 0  0      0 7241852   1360 424164    0    0   864   320  133 1266  1  1 94  5  0
 0  1      0 7228956   1368 437400    0    0 13328     0   45  366  0  0 83 17  0
 0  1      0 7196320   1368 470148    0    0 32640     0   33  413  1  1 50 49  0
 ...
 0  0      0 6747540   1368 918576    0    0 29056     0   42  568  0  0 56 44  0
 0  0      0 6747540   1368 918576    0    0     0     0   40  141  1  0 100  0  0

Before DataService stops, the cache continuously grows three times before dropping back down. This is because DataService clears the page cache every 3 seconds. After DataService stops, the cache will keep increasing until it reaches 918576 and then stop growing.

At this point, the disk read (bi) drops to 0, and the iowait (wa) also drops to 0. This indicates that all the data is already in the system’s cache. We know that the cache is a part of the memory and its access speed is much faster than that of the disk. This can explain why the query speed of MySQL improves significantly.

From this case study, you can see that the MyISAM engine of MySQL doesn’t cache data itself but relies on the system cache to speed up disk I/O access. Once other applications are running concurrently, it becomes difficult for the MyISAM engine to fully utilize the system cache. This is because the system cache may be occupied by other applications or even directly cleared.

Therefore, in general, I do not recommend relying solely on the system cache for optimizing the performance of an application. As the saying goes, it is best to allocate memory within the application and build a fully self-controlled cache, such as the InnoDB engine of MySQL, which caches both indexes and data; or you can use third-party caching applications such as Memcached, Redis, and so on.

Today, I mainly addressed these questions. I also welcome you to continue writing your questions and thoughts in the comments section, and I will continue to answer them. I hope that with every Q&A session, we can internalize the knowledge from the articles into your abilities. We not only practice in actual projects but also progress through communication.