13 Q& a How to Deal With the Issue of Simulating Res in Middle of Break

13 Q&A How to Deal with the Issue of Simulating RES in Middle of Break #

Hello, I’m Ni Pengfei.

Since the column was updated, we have already completed the CPU performance section, one of the four basic modules. I’m happy to see that more than half of the students are still studying, actively practicing, and leaving a lot of comments with enthusiasm.

Among these comments, I’m very pleased to see that many students have been able to apply what they have learned, using the case study approach to analyze performance bottlenecks in online applications and solve real-world performance issues. Some students have also repeatedly pondered and thought critically, pointing out some inappropriate or inaccurate statements in the article. I’m very thankful for that and would love to discuss further with you.

In addition, many of the questions raised in the comments are valuable. I have already replied to most of them in the app. For some typical questions that are not convenient to reply to on mobile or that have high value, I have specifically selected them and organized them as today’s Q&A content to reply in a centralized manner. On the other hand, this is also to ensure that no one misses any important points.

Today is the first session of the performance optimization Q&A. For the convenience of your learning and understanding, they are not strictly arranged in the order of the articles. For each question, I have attached a screenshot of the question asked in the comments section. If you need to review the original content, you can scan the QR code in the lower right corner of each question.

Question 1: Insufficient version of performance tools leads to incomplete metrics #

This is a problem commonly encountered by those using CentOS. In the article, there is a metric called %wait in my pidstat output, which represents the percentage of time a process waits for the CPU. This is a new metric introduced in version 11.5.5 of systat, and the older versions do not have this metric. Unfortunately, the version of sysstat in the CentOS software repository happens to be lower than this, so this metric is not available.

However, there is no need to worry. As I mentioned before, tools are just means of searching and analyzing, while metrics are what we focus on. If the %wait metric is not displayed in your pidstat, there are other means to find this metric.

For example, when explaining system principles and performance tools, I usually introduce some knowledge about the proc file system to help you understand the various metrics provided by the proc file system. The reason for doing this is twofold. On one hand, of course, it is to help you intuitively understand the working principles of the system. On the other hand, it is also to show you where the raw data for the various performance metrics displayed in performance tools come from.

In this way, even if you are likely to run an older version of the operating system in an actual production environment and do not have permission to install new software packages, you can still view the proc file system and obtain the metrics you want.

However, for the study of performance analysis, I still recommend using the latest performance tools. The new tools have more comprehensive metrics, making it easier for you to start analyzing. This definite advantage allows you to obtain the desired data more intuitively and makes it less likely for you to give up.

Of course, when you are just starting to learn, it is best to try to understand the principles of the performance tools, or familiarize yourself with their use first, and then go back to study the principles. This way, even in an environment where it is not possible to install new tools, you can still obtain the same metrics from the proc file system or other sources and perform effective analysis.

Problem 2: Unable to simulate high iowait using the stress command #

Using the stress command does not allow for simulating an increase in iowait, but an increase in sys is observed instead. This is because in the given scenario, the stress command is used with the -i parameter, which indicates simulating I/O issues through the system call sync(). However, this method is not reliable.

The purpose of sync() is to flush data from the memory buffer to the disk to ensure synchronization. If there is not much data in the buffer to begin with, then there won’t be much data read or written to the disk, thus unable to generate I/O pressure.

This is particularly evident when using SSD disks, where it is possible to have iowait always at 0, but a high sys CPU usage caused purely by a large number of system calls.

I have mentioned this in my previous comment as well and recommended using stress-ng as an alternative to stress. In case you may have missed it, I want to emphasize it once again.

You can run the following command to simulate the iowait issue:

# -i still calls sync(), while --hdd indicates reading/writing temporary files
$ stress-ng -i 1 --hdd 1 --timeout 600

Question 3: Unable to simulate the RES interrupt problem #

This question is about not being able to simulate the problem of increased RES interrupt even after running a large number of threads.

Actually, I have mentioned in the case of CPU context switching that the RES interrupt is a mechanism used by the scheduler to distribute tasks to different CPUs, which means waking up idle CPUs to schedule new tasks. This is usually achieved through Inter-Processor Interrupts (IPI).

Therefore, this interrupt is meaningless on a single-core machine (with only one logical CPU) because there is no occurrence of rescheduling in the first place.

However, as mentioned in the comment, the problem of context switching still exists. So, you will see that the “cs” (context switch) increases from several hundred to tens of thousands, and both voluntary and involuntary context switches of sysbench threads also increase significantly, especially involuntary context switches, which may reach tens of thousands. Based on the meaning of involuntary context switches, we all know that this is due to excessive threads contending for the CPU.

In fact, this conclusion can also be obtained from another perspective. For example, you can include the “-u” and “-t” options in pidstat to output the CPU usage of threads. You will see the following interface:

$ pidstat -u -t 1

14:24:03      UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
14:24:04        0         -      2472    0.99    8.91    0.00   77.23    9.90     0  |__sysbench
14:24:04        0         -      2473    0.99    8.91    0.00   68.32    9.90     0  |__sysbench
14:24:04        0         -      2474    0.99    7.92    0.00   75.25    8.91     0  |__sysbench
14:24:04        0         -      2475    2.97    6.93    0.00   70.30    9.90     0  |__sysbench
14:24:04        0         -      2476    2.97    6.93    0.00   68.32    9.90     0  |__sysbench
...

From the output interface of pidstat, you can see that the “%wait” of each stress thread is as high as 70%, while the CPU usage is less than 10%. In other words, most of the time the stress thread is spent waiting for the CPU, which also indicates that there are indeed too many threads contending for the CPU.

By the way, I would like to mention a common mistake mentioned in the comments. Some students compare the “%wait” in pidstat with the “iowait%” (abbreviated as wa) in top, which is actually meaningless because they are two completely unrelated metrics.

In pidstat, “%wait” represents the percentage of time the process waits for the CPU.
In top, “iowait%” represents the percentage of CPU time spent waiting for I/O.

Recall the process states we have learned, you should remember that processes waiting for the CPU are already in the CPU’s ready queue and are in the running state, while processes waiting for I/O are in an uninterruptible state.

Also, the sysbench execution parameters may vary depending on the version. For example, in the Ubuntu 18.04 case, the command to run sysbench is:

$ sysbench --threads=10 --max-time=300 threads run

While in Ubuntu 16.04, the format is slightly different (thanks to Haku for sharing the command):

$ sysbench --num-threads=10 --max-time=300 --test=threads run

Question 4: Unable to simulate I/O performance bottleneck and excessive I/O pressure #

This problem can be seen as an extension of the previous problem, but instead of using the stress command, it uses an app application running in a container.

In fact, in the case of I/O bottleneck, in addition to the unsuccessful simulation mentioned above, there are also many comments stating the opposite, saying that the I/O pressure in the case is too high, causing various problems with their machines, and even the system is unresponsive.

The reason for this is actually because everyone’s machine configuration is different, including not only different CPU and memory configurations, but also huge differences in disks. For example, the performance difference between mechanical disks (HDD), low-end solid state disks (SSD) and high-end solid state disks can be several to tens of times.

In fact, the machine I used for the case is also a low-end SSD, which is slightly better than mechanical disks, but it still can’t compare with high-end solid state disks. Therefore, under the same operation, an I/O bottleneck occurs on my machine, but if it is replaced with a machine using mechanical disks, the disk I/O may be overwhelmed (showing 100% usage for a long time), and if a better SSD is used, it may not generate enough I/O pressure.

In addition, in the case, I only searched for disks with the prefix /dev/xvd and /dev/sd, but did not consider students who use other prefixed disks (such as /dev/nvme). If you happen to use a different prefix, you may encounter a problem similar to Vicky’s, that is, the app starts and then exits quickly, becoming the “exited” state.

Here, Berryfl offered a good suggestion: you can add a parameter in the case to specify the block device, so that students who need it don’t have to compile and package the case application themselves.

Therefore, in the latest case, I added three options for the app application.

-d Set the disk to be read, the default is disks with the prefix /dev/sd or /dev/xvd.
-s Set the size of data read each time, in bytes, default is 67108864 (which is 64MB).
-c Set the number of times each child process reads, default is 20 times, that is, the child process exits after reading 20 * 64MB of data.

You can click here to view the source code on Github, and I have also written the usage method here:

$ docker run --privileged --name=app -itd feisky/app:iowait /app -d /dev/sdb -s 67108864 -c 20

After running the case, you can execute docker logs to view its logs. Under normal circumstances, you can see the following output:

$ docker logs app
Reading data from disk /dev/sdb with buffer size 67108864 and count 20

Question 5: The first line of performance tool output (such as vmstat) has a significant difference compared to the other lines #

This question is mainly about the significant difference between the first line of data and the other lines when running vmstat. I believe many students have noticed this phenomenon, and here I will explain it briefly.

First of all, remember the sentence I always emphasize: when encountering phenomena that cannot be intuitively explained, go to the command manual first.

For example, when you run the command man vmstat, you can find the following sentence in the manual:

The first report produced gives averages since the last reboot. Additional reports give information on a sampling period of length delay. The process and memory reports are instantaneous in either case.

This means that the first line of data is the average value since the system was last rebooted, while the other lines are the average values during the interval you set when running the vmstat command. In addition, the process and memory reports are instantaneous values in either case.

You see, this is not a big deal, but if we are not aware of this, it can block our thinking and prevent us from further analysis. I also have to mention here the importance of documentation.

“Give a man a fish and you feed him for a day; teach a man to fish and you feed him for a lifetime.” The core of our column’s learning is to teach you the principles and approaches of performance analysis. Performance tools are just our paths and means. Therefore, when mentioning various performance tools, I did not explain in detail the meanings of various command-line options for each tool, partly because it is easy for you to find this information through documentation, and partly because the meanings of individual options may vary in different versions or systems.

So, regardless of the reason, consulting the manual is always the fastest and most accurate way. Especially when you find that the output of certain tools does not make sense, always remember to check the documentation first. If you can’t understand the documentation, then search online or ask me in the column.

Learning is a process of going from thin to thick and then thinning down again. We start learning with detailed knowledge and accumulate to a certain degree. Then, we need to organize it into a system for memorization, while continuously making detailed repairs to this system. It is through questioning and reflection that we can achieve the best learning results.

Finally, feel free to continue writing down your questions in the comment section. I will continue to answer them. My goal remains the same, hoping to turn the knowledge in the articles into your abilities. We not only practice in real scenarios but also progress through communication.