05 Fundamentals an Application's CPU Usage Reached 100%, What Should I Do

05 Fundamentals An Application’s CPU Usage Reached 100%, What Should I Do #

Hello, I’m Ni Pengfei.

After the previous two sections on average load and CPU context switching, I believe you already have a preliminary understanding of CPU performance. However, I still want to ask, before learning this column, what metric do you most commonly use to describe CPU performance in your work? I believe your answer may not be average load or CPU context switching, but another more intuitive metric - CPU usage.

As we mentioned before, CPU usage is a statistic of CPU usage within a unit of time, presented as a percentage. So, as the most commonly used and familiar CPU metric, can you explain how CPU usage is calculated? Also, when it comes to performance tools like top and ps, can you understand the differences between metrics such as %user, %nice, %system, %iowait, and %steal?

Today, I will guide you to understand the concept of CPU usage. At the same time, I will use our most commonly used reverse proxy server, Nginx, as an example to guide you through step-by-step operations and analysis, deepening your understanding.

CPU Usage #

In the previous article, I mentioned that Linux, as a multitasking operating system, divides the time of each CPU into short time slices and allocates them to different tasks through a scheduler, creating the illusion of multitasking.

To maintain CPU time, Linux triggers time interrupts at a predetermined rate (represented as HZ in the kernel) and uses the global variable Jiffies to record the number of ticks since boot. Each time a time interrupt occurs, the value of Jiffies is incremented by 1.

The tick rate HZ is a configurable option in the kernel and can be set to values such as 100, 250, 1000, etc. Different systems may have different values set, which you can check by querying the kernel option in /boot/config. For example, on my system, the tick rate is set to 250, meaning there are 250 time interrupts triggered per second.

$ grep 'CONFIG_HZ=' /boot/config-$(uname -r)
CONFIG_HZ=250

Additionally, because the tick rate HZ is a kernel option, user-space programs cannot directly access it. To facilitate user-space programs, the kernel provides a user-space tick rate, USER_HZ, which is always fixed at 100, equivalent to 1/100 second. This means that user-space programs do not need to be concerned with the actual value set for HZ in the kernel since they always see a fixed value of USER_HZ.

Linux provides system internals information to user-space through the /proc virtual file system, and /proc/stat provides CPU and task statistics. For example, if you are only interested in CPU data, you can execute the following command:

# Display data only for each CPU
$ cat /proc/stat | grep ^cpu
cpu  280580 7407 286084 172900810 83602 0 583 0 0 0
cpu0 144745 4181 176701 86423902 52076 0 301 0 0 0
cpu1 135834 3226 109383 86476907 31525 0 282 0 0 0

The output here is a table. The first column represents the CPU number, such as cpu0, cpu1. The first row without a number represents the cumulative values of all CPUs. The other columns represent the accumulated tick numbers in different scenarios, measured in USER_HZ which is equivalent to 10 milliseconds (1/100 second). Therefore, these figures represent the CPU time in different scenarios.

Of course, you don’t need to remember the order of each column. You only need to remember that when needed, you can refer to the man page for proc. However, you should be clear about the meaning of each column in the man page. They are important indicators related to CPU usage and you will encounter them in many other performance tools. Let me explain each column one by one:

  • user (usually abbreviated as “us”) represents user CPU time. Note that it does not include the nice time but includes guest time.
  • nice (usually abbreviated as “ni”) represents low-priority user CPU time, which is the CPU time when the process’s nice value is adjusted between 1 and 19. It’s worth noting that the nice value ranges from -20 to 19, and the larger the value, the lower the priority.
  • system (usually abbreviated as “sys”) represents kernel CPU time.
  • idle (usually abbreviated as “id”) represents idle time. Note that it does not include the time spent waiting for I/O (iowait).
  • iowait (usually abbreviated as “wa”) represents CPU time spent waiting for I/O.
  • irq (usually abbreviated as “hi”) represents CPU time spent handling hardware interrupts.
  • softirq (usually abbreviated as “si”) represents CPU time spent handling software interrupts.
  • steal (usually abbreviated as “st”) represents CPU time taken by other virtual machines when the system is running in a virtualized environment.
  • guest (usually abbreviated as “guest”) represents CPU time spent running other operating systems through virtualization, i.e., running a virtual machine.
  • guest_nice (usually abbreviated as “gnice”) represents CPU time spent running a virtual machine with low priority.

The CPU usage we typically refer to is the percentage of time excluding idle time, calculated as a ratio to the total CPU time, using the following formula:

With this formula, we can easily calculate CPU usage from the data in /proc/stat. Alternatively, you can calculate the usage of each scenario by dividing the CPU time for each scenario by the total CPU time.

But don’t rush to calculate just yet. Can you tell what time period the CPU usage calculated from the data in /proc/stat refers to?

By now, you should remember that it represents the cumulative number of ticks since boot. Therefore, the direct calculation gives the average CPU usage since boot and generally has little significance as a reference.

In practice, to calculate CPU usage, performance analysis tools usually take two values at a certain interval (e.g., every 3 seconds), calculate the difference, and then calculate the average CPU usage during that period:

This formula is the actual calculation method for CPU usage that we see in various performance analysis tools.

Now that we know how to calculate the system’s CPU usage, what about processes? Similar to system metrics, Linux also provides statistical information about the runtime of each process through the /proc/[pid]/stat file. However, this file contains more extensive data, with a total of 52 columns.

Of course, there’s no need to worry, because you don’t need to memorize the meaning of each column. As I mentioned before, you can simply refer to the man page for proc when necessary.

Looking back, does that mean we have to read both /proc/stat and /proc/[pid]/stat to check CPU usage and then calculate it using the formula mentioned above?

Certainly not. Various performance analysis tools have already done the calculation for us. However, be aware that the CPU usage reported by these tools is typically an average over a certain interval of time. Therefore, you need to pay attention to the interval setting, especially when comparing analysis results from multiple tools, ensure that they use the same interval.

For example, if you compare the CPU usage reported by top and ps, the default results may differ because top uses a default 3-second interval, while ps reports the CPU usage over the entire process lifetime.

How to View CPU Usage #

Now that we know the meaning of CPU usage, let’s see how to view CPU usage. When it comes to tools for viewing CPU usage, I guess your first thought is probably top and ps. Indeed, top and ps are the most commonly used performance analysis tools:

  • top shows the overall CPU and memory usage of the system, as well as the resource usage of each process.

  • ps only shows the resource usage of each process.

For example, the output format of top is:

# Refresh every 3 seconds by default
$ top
top - 11:58:59 up 9 days, 22:47,  1 user,  load average: 0.03, 0.02, 0.00
Tasks: 123 total,   1 running,  72 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  8169348 total,  5606884 free,   334640 used,  2227824 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  7497908 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0   78088   9288   6696 S   0.0  0.1   0:16.83 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.05 kthreadd
    4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/0:0H
...

In this output, the third line %Cpu is the system’s CPU usage. The meanings of each column have been explained in the previous section, but the CPU time has been transformed into CPU usage, so I won’t repeat it here. However, please note that top displays the average value of all CPUs by default. If you want to switch to the usage of each CPU, you just need to press the number 1.

Continuing, after the blank line, there is real-time information about processes. Each process has an %CPU column, which represents the CPU usage of the process. It includes the sum of CPU usage in user space, kernel space executed through system calls, and CPU waiting in the ready queue. In a virtualized environment, it also includes the CPU occupied by running virtual machines.

So, up to this point, we can see that top does not differentiate between user space CPU and kernel space CPU for each process. How can we view the detailed information of each process? You should remember the pidstat tool mentioned in the previous section, which is specifically used to analyze the CPU usage of each process.

For example, the following pidstat command displays the CPU usage of processes at an interval of 1 second, including:

  • User CPU usage (%usr);

  • Kernel CPU usage (%system);

  • CPU usage of running virtual machines (%guest);

  • CPU usage while waiting (%wait);

  • And the overall CPU usage (%CPU).

The Average section at the end also calculates the average of the 5 sets of data.

# Output a set of data every 1 second, a total of 5 sets
$ pidstat 1 5
15:56:02      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
15:56:03        0     15006    0.00    0.99    0.00    0.00    0.99     1  dockerd

...

Average:      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
Average:        0     15006    0.00    0.99    0.00    0.00    0.99     -  dockerd

What to do if CPU usage is too high? #

By using tools such as top, ps, and pidstat, you can easily find processes with high CPU usage (e.g. 100%). Next, you may want to know which function in the code is causing the CPU to be occupied. Finding it will allow you to optimize more efficiently and specifically.

I guess the first thing that comes to mind is GDB (The GNU Project Debugger), a powerful program debugging tool. Indeed, GDB is powerful in debugging program errors. However, I have a “nitpicking” to make. Please remember that GDB is not suitable for early use in performance analysis.

Why is that? Because the process of debugging a program with GDB interrupts the program execution, which is often not allowed in a live environment. Therefore, GDB is only suitable for use in the later stages of performance analysis, when you have identified the problematic function and can use it offline to further debug issues within the function.

So which tool is suitable for analyzing CPU issues in real time? My recommendation is perf. perf is a built-in performance analysis tool in Linux 2.6.31 and later. It is based on performance event sampling, and can not only analyze various system events and kernel performance, but also analyze performance problems in specific applications.

When using perf to analyze CPU performance issues, let me tell you about the two most common and my favorite use cases.

The first common use case is perf top, similar to top. It can display in real time the function or instruction that consumes the most CPU cycles, so it can be used to find hot functions. The usage is as follows:

$ perf top
Samples: 833  of event 'cpu-clock', Event count (approx.): 97742399
Overhead  Shared Object       Symbol
   7.28%  perf                [.] 0x00000000001f78a4
   4.72%  [kernel]            [k] vsnprintf
   4.32%  [kernel]            [k] module_get_kallsym
   3.65%  [kernel]            [k] _raw_spin_unlock_irqrestore
...

In the output, the first line contains three pieces of information: the number of samples, the event type, and the total number of events. For example, in this example, perf has collected a total of 833 CPU clock events, and the total number of events is 97742399.

Furthermore, pay special attention to the number of samples. If the number of samples is too small (e.g. only a few), the subsequent sorting and percentages will not have much practical reference value.

Looking further down, there is a table-like data, with four columns in each row:

  • The first column is Overhead, which represents the proportion of the performance events of that symbol in all the samples, expressed in percentages.

  • The second column is Shared, which is the dynamic shared object (DSO) where the function or instruction is located, such as the kernel, process name, dynamic link library name, kernel module name, etc.

  • The third column is Object, which is the type of dynamic shared object. For example, [.] represents a user-space executable program or dynamic link library, while [k] represents the kernel space.

  • The last column is Symbol, the symbol name, i.e., the function name. When the function name is unknown, it is represented by a hexadecimal address.

Using the above output as an example, we can see that the program “perf” itself consumes the most CPU clock cycles, but its proportion is only 7.28%, indicating that there is no CPU performance problem in the system. You should now have a clear understanding of how to use perf top.

Next, let’s look at the second common use case, perf record and perf report. Although perf top displays real-time performance information of the system, its disadvantage is that it does not save data, so it cannot be used for offline or subsequent analysis. On the other hand, perf record provides the functionality to save data, and the saved data needs to be parsed and displayed using perf report.

$ perf record # Stop sampling with Ctrl+C
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.452 MB perf.data (6093 samples) ]

$ perf report # Displays a report similar to perf top

In actual use, we often add the -g parameter to perf top and perf record to enable call graphs sampling, which facilitates performance analysis based on the call chain.

Case Study #

In this case, let’s take the Nginx + PHP web service as an example to see how to use tools like top and perf to identify abnormal processes causing high CPU usage and find the functions that are causing performance issues.

Your Setup #

The following case is based on Ubuntu 18.04 and is also applicable to other Linux systems. The environment I used for this case is as follows:

  • Machine configuration: 2 CPUs, 8GB memory

  • Required tools installed beforehand: docker, sysstat, perf, ab, etc. You can install them with the command apt install docker.io sysstat linux-tools-common apache2-utils.

First, let me briefly introduce the tool that we’ll be using in this case, ab. ab (apache bench) is a commonly used HTTP service performance testing tool, which is used here to simulate clients for Nginx. Since configuring Nginx and PHP can be cumbersome, I’ve packaged them into two Docker images. This way, by running two containers, we can create a simulation environment.

Note that this case requires two virtual machines, as shown in the following diagram:

As you can see, one of the virtual machines is used as a web server to simulate the performance issue, and the other one is used as a client to put pressure on the web service. Using two virtual machines isolates them from each other to avoid “cross-infection”.

Next, let’s open two terminals and SSH into each machine separately and install the tools mentioned above.

Same as before, here is the “recipe”. All the commands below are assumed to be run as the root user. If you are logged in as a regular user, make sure to run the sudo su root command to switch to the root user before proceeding. With that, the preparation work is done.

However, before we proceed with the operations, I would like to mention something. The core logic of the PHP application in this case is relatively simple, and most people can identify the problem at a glance. But keep in mind that the actual source code in a production environment is much more complex.

So, I suggest that before following the steps, you should analyze it as a black box without looking at the source code (to avoid preconception). This way, you can better understand the whole problem-solving process, how to analyze the bottleneck in the application starting from the system’s resource usage issue, and roughly locate the bottleneck within the application.

Operations and Analysis #

Next, let’s proceed to the operational part.

First, in the first terminal, run the following commands to start Nginx and PHP application:

$ docker run --name nginx -p 10000:80 -itd feisky/nginx
$ docker run --name phpfpm -itd --network container:nginx feisky/php-fpm

Then, in the second terminal, use curl to access http://[VM1’s IP]:10000 to confirm that Nginx has started correctly. You should see the response “It works!”.

# 192.168.0.10 is the IP address of the first virtual machine
$ curl http://192.168.0.10:10000/
It works!

Next, let’s test the performance of this Nginx service. In the second terminal, run the following ab command:

# Test Nginx performance with 10 concurrent requests and a total of 100 requests
$ ab -c 10 -n 100 http://192.168.0.10:10000/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, 
...
Requests per second:    11.63 [#/sec] (mean)
Time per request:       859.942 [ms] (mean)
...

From the output of ab, we can see that Nginx can only handle an average of 11.63 requests per second. You must be thinking that this is quite poor. So where exactly is the problem? Let’s observe with top and pidstat.

This time, in the second terminal, increase the total number of requests to 10,000. This way, when you use performance analysis tools in the first terminal, Nginx will still be under pressure.

Continue in the second terminal and run the ab command:

$ ab -c 10 -n 10000 http://192.168.0.10:10000/

Then, go back to the first terminal and run the top command. Press the number 1 to switch to the CPU usage of each individual CPU:

$ top
...
%Cpu0  : 98.7 us,  1.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 99.3 us,  0.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
...
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
21514 daemon    20   0  336696  16384   8712 R  41.9  0.2   0:06.00 php-fpm
21513 daemon    20   0  336696  13244   5572 R  40.2  0.2   0:06.08 php-fpm
21515 daemon    20   0  336696  16384   8712 R  40.2  0.2   0:05.67 php-fpm
21512 daemon    20   0  336696  13244   5572 R  39.9  0.2   0:05.87 php-fpm
21516 daemon    20   0  336696  16384   8712 R  35.9  0.2   0:05.61 php-fpm

Here we can see that the CPU usage of several php-fpm processes in the system adds up to nearly 200%, and the user space CPU usage (us) of each CPU has also exceeded 98%, close to saturation. So, we can confirm that it is the php-fpm processes in the user space that have caused the CPU usage to spike.

So, how can we find out which function of php-fpm is causing the high CPU usage? Let’s use perf to analyze it. Run the following perf command in the first terminal:

# -g enables call-graph analysis, -p specifies the process ID of php-fpm 21515
$ perf top -g -p 21515

Use the arrow keys to switch to php-fpm, and then press Enter to expand the call graph of php-fpm. You will find that the call graph ultimately reaches sqrt and add_function. It seems that we need to start with these two functions.

Let’s copy out the source code of the Nginx application and see if these two functions are called:

# Copy the PHP source code from the phpfpm container
$ docker cp phpfpm:/app .

# Use grep to search for function calls
$ grep sqrt -r app/ # sqrt is found to be called
app/index.php:  $x += sqrt($x);
$ grep add_function -r app/ # add_function is not found, it is actually a built-in PHP function

OK, it turns out that only the sqrt function is called in the app/index.php file. So, let’s take a look at the source code of this file:

$ cat app/index.php
<?php
// test only.
$x = 0.0001;
for ($i = 0; $i <= 1000000; $i++) {
  $x += sqrt($x);
}

echo "It works!"

Oh, can you spot the issue? I guess you’ll laugh at me. I made such a silly mistake of not deleting the test code before deploying the application. In order to facilitate you to verify the effect of the optimization, I have also packaged the fixed application into a Docker image. You can run the following command in the first terminal to run it:

# Stop the original application
$ docker rm -f nginx phpfpm
# Run the optimized application
$ docker run --name nginx -p 10000:80 -itd feisky/nginx:cpu-fix
$ docker run --name phpfpm -itd --network container:nginx feisky/php-fpm:cpu-fix

Next, let’s verify the effect of the fix in the second terminal. First, after stopping the previous ab command with Ctrl+C, run the following command:

$ ab -c 10 -n 10000 http://10.240.0.5:10000/
...
Complete requests:        10000
Failed requests:        0
Total transferred:        1720000 bytes
HTML transferred:        90000 bytes
Requests per second:        2237.04 [#/sec] (mean)
Time per request:        4.470 [ms] (mean)
Time per request:        0.447 [ms] (mean, across all concurrent requests)
Transfer rate:        375.75 [Kbytes/sec] received
...

Here you can see that the average number of requests per second has increased from 11 to 2237.

You see, it’s such a silly little problem that can have a huge impact on performance, and it’s not easy to find either. Of course, once the problem is found, the solution is simple, just delete the test code.

Conclusion #

CPU utilization is the most intuitive and commonly used system performance metric, and it is usually the first metric we focus on when troubleshooting performance issues. Therefore, we need to be familiar with its meanings, especially understanding the different CPU utilization rates for user (%user), Nice (%nice), system (%system), I/O wait (%iowait), interrupt (%irq), and soft interrupt (%softirq).

For example:

  • If the user CPU and Nice CPU are high, it means that user processes are using a lot of CPU resources, so we should focus on troubleshooting process performance issues.

  • If the system CPU is high, it means that the kernel is using a lot of CPU resources, so we should focus on troubleshooting kernel threads or system call performance issues.

  • If the I/O wait CPU is high, it means that there is a long wait time for I/O, so we should focus on troubleshooting possible I/O issues with the system storage.

  • If the soft interrupt and hard interrupt CPU are high, it means that the CPU is being occupied by soft or hard interrupt handlers, so we should focus on troubleshooting interrupt service routines in the kernel.

When encountering high CPU utilization issues, you can use tools such as top and pidstat to identify the source of the CPU performance problem, and then use tools like perf to pinpoint the specific functions causing the performance problem.

Reflection #

Finally, I would like to invite you to discuss your understanding of CPU usage and how you analyze it when you find that CPU usage is increasing. You can summarize your thoughts based on today’s content and your own operational records.

Feel free to discuss with me in the comments section, and also feel free to share this article with your colleagues and friends. Let’s practice in real scenarios and make progress through communication.