25 Fundamentals How Linux Disk Io Works Part Two

25 Fundamentals How Linux Disk IO Works Part Two #

Hello, I’m Ni Pengfei.

In the previous section, we learned about the working principle of Linux disk I/O and understood the Linux storage system I/O stack, which consists of the file system layer, the generic block layer, and the device layer.

Among them, the generic block layer is the core of Linux disk I/O. Upwards, it provides a standard interface for file systems and applications to access block devices. Downwards, it abstracts various heterogeneous disk devices into a unified block device. It also reorders and merges I/O requests received from file systems and applications to improve disk access efficiency.

Now that you have mastered the working principle of disk I/O, you are probably eager to know how to measure the I/O performance of a disk.

Next, let’s take a look at the performance indicators of disks and the methods to observe these indicators.

Disk Performance Metrics #

When it comes to measuring disk performance, we must mention five common metrics that we often use: utilization, saturation, IOPS, throughput, and response time. These five metrics are the basic indicators for evaluating disk performance.

Utilization refers to the percentage of time the disk is processing I/O. High utilization (such as over 80%) usually indicates a performance bottleneck in disk I/O.
Saturation refers to the level of busyness in processing I/O on the disk. High saturation indicates a severe performance bottleneck in the disk. When saturation reaches 100%, the disk cannot accept new I/O requests.
IOPS (Input/Output Per Second) refers to the number of I/O requests per second.
Throughput refers to the size of I/O requests per second.
Response time refers to the interval between sending an I/O request and receiving a response.

It is important to note that utilization only considers the presence of I/O, not the size of I/O. In other words, when the utilization is 100%, the disk may still be able to accept new I/O requests.

These metrics are likely to be frequently mentioned when discussing disk performance. However, I want to emphasize that it is important not to compare a single metric in isolation, but to analyze them in the context of read/write ratio, I/O type (random or sequential), and the size of I/O.

For example, in scenarios with more random read/writes, such as databases or large numbers of small files, IOPS can better reflect the overall system performance. On the other hand, in scenarios with more sequential read/writes, such as multimedia, throughput is a better reflection of the overall system performance.

Generally, when selecting servers for an application, it is important to first benchmark the I/O performance of the disks in order to accurately evaluate whether the disk performance can meet the requirements of the application.

In this regard, I recommend using the performance testing tool fio to test core metrics such as IOPS, throughput, and response time of the disk. But as I mentioned before, it is important to be flexible and adapt to the specific situation. During benchmark testing, it is necessary to evaluate the metrics based on the characteristics of the application’s I/O.

Of course, this requires you to test the performance of different I/O sizes (usually several values between 512B and 1MB) in various scenarios such as random reads, sequential reads, random writes, and sequential writes.

The metrics obtained from performance tools can serve as the basis for analyzing application performance in the future. Once performance issues occur, you can use them as the limit values for disk performance and evaluate the usage of disk I/O accordingly.

Understanding disk performance metrics is just the first step in our I/O performance testing. So, what methods should we use to observe them? Here, I will introduce several commonly used methods for observing I/O performance.

Disk I/O Monitoring #

The first thing to observe is the usage of each disk.

iostat is the most commonly used tool to monitor disk I/O performance. It provides various performance indicators such as usage rate, IOPS, throughput, etc. for each disk. These indicators are actually derived from /proc/diskstats.

Here is an example of iostat output.

# -d -x displays all disk I/O indicators
$ iostat -d -x 1
Device       r/s   w/s   rkB/s   wkB/s rrqm/s wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
loop0        0.00  0.00    0.00    0.00   0.00   0.00   0.00   0.00    0.00    0.00  0.00     0.00     0.00   0.00   0.00
loop1        0.00  0.00    0.00    0.00   0.00   0.00   0.00   0.00    0.00    0.00  0.00     0.00     0.00   0.00   0.00
sda          0.00  0.00    0.00    0.00   0.00   0.00   0.00   0.00    0.00    0.00  0.00     0.00     0.00   0.00   0.00
sdb          0.00  0.00    0.00    0.00   0.00   0.00   0.00   0.00    0.00    0.00  0.00     0.00     0.00   0.00   0.00

From here, you can see that iostat provides a rich set of performance indicators. The first column “Device” represents the name of the disk device, and although there are many columns for the indicators, the meaning of each indicator is important. To help you understand, I have summarized them in a table.

Among these indicators, you should pay attention to:

%util, which is the disk I/O utilization we mentioned earlier;
r/s + w/s, which represents IOPS;
rkB/s + wkB/s, which represents throughput;
r_await + w_await, which represents response time.

When observing the indicators, don’t forget to analyze them in conjunction with the request size (rareq-sz and wareq-sz).

You may have noticed that iostat does not directly provide disk saturation. In fact, there is usually no simple method to observe saturation. However, you can compare the observed average request queue length or the average wait time for completing read/write requests with benchmark test results (e.g., using fio) to comprehensively evaluate the disk saturation.

Process I/O Monitoring #

In addition to the I/O status of each disk, we also need to pay attention to the I/O status of each process.

The aforementioned iostat only provides overall I/O performance data for disks, and it cannot show which specific processes are performing disk read and write operations. To observe the I/O status of processes, you can use two tools: pidstat and iotop.

Pidstat is already familiar to us, so I won’t go into detail about its functions here. By adding the -d parameter, you can see the I/O status of processes, as shown below:

$ pidstat -d 1
13:39:51     UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay Command
13:39:52     102       916      0.00      4.00      0.00       0 rsyslogd

From the output of pidstat, you can see that it can show the real-time I/O status of each process, including the following information:

User ID (UID) and Process ID (PID).
Size of data read per second (kB_rd/s), in KB.
Size of data written per second (kB_wr/s), in KB.
Size of data canceled per second (kB_ccwr/s), in KB.
Block I/O delay (iodelay), including the time spent waiting for synchronous block I/O and the completion of block I/O paging-in, in clock cycles.

In addition to real-time monitoring with pidstat, it is also common to sort processes by I/O size for performance analysis. For this purpose, I recommend another tool called iotop. It is a tool similar to top, which allows you to sort processes by I/O size and find the processes with larger I/O.

The output of iotop is shown below:

$ iotop
Total DISK READ :       0.00 B/s | Total DISK WRITE :       7.85 K/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
15055 be/3 root        0.00 B/s    7.85 K/s  0.00 %  0.00 % systemd-journald

From this output, you can see that the first two lines represent the total disk read and write sizes of processes, as well as the actual disk read and write sizes. Due to the influence of cache, buffer, and I/O merging, these values may not be equal.

The remaining part represents the I/O status of processes from various perspectives, including thread ID, I/O priority, size of disk read per second, size of disk write per second, and the percentage of clock cycles spent on paging-in and waiting for I/O.

These two tools are the most commonly used when analyzing disk I/O performance. Now that you have an understanding of their functions and metric meanings, we will learn about their specific usage in the upcoming practical case studies.

Summary #

Today, we have reviewed the performance indicators and tools for Linux disk I/O. The performance of disk I/O is typically evaluated using several indicators including IOPS, throughput, utilization, saturation, and response time.

You can use iostat to obtain information about disk I/O, and you can also use pidstat, iotop, and other tools to monitor the I/O activities of processes. However, when analyzing these performance indicators, you should consider factors such as the read-write ratio, I/O type, and I/O size to perform comprehensive analysis.

Reflection #

Lastly, I’d like to chat with you about any disk I/O issues you have encountered. When dealing with disk I/O performance issues, how do you analyze and diagnose them? You can use the disk I/O metrics and tools learned today, as well as the disk I/O principles covered in the previous section, to summarize your approach.

Feel free to discuss with me in the comments section, and feel free to share this article with your colleagues and friends. Let’s practice and improve together through hands-on experience and communication.