07 Case Study Dealing With Uninterruptible Sleep and Zombie Processes in the System Part One

07 Case Study Dealing with Uninterruptible Sleep and Zombie Processes in the System Part One #

Hello, I am Ni Pengfei.

In the previous section, I used a case of Nginx+PHP to talk about the analysis and response methods for high CPU usage on a server. You must remember that when encountering inexplicable CPU usage problems, you should first check if there is a short-lived application causing trouble.

Short-lived applications have a short running time, making it difficult to detect them in tools like top or ps that display system summaries and process snapshots. You need to use tools that can record events to assist in diagnosis, such as execsnoop or perf top.

You don’t need to memorize these ideas deliberately. Practice them several times and try to think about them in your operations, then you will be able to apply them flexibly.

In addition, we also talked about the types of CPU usage. In addition to the user CPU mentioned in the previous section, it also includes system CPU (such as context switches), CPU waiting for I/O (such as waiting for disk response), and interrupt CPU (including soft interrupts and hard interrupts).

In the article on context switches, we have already analyzed the issue of high system CPU usage. The remaining issue is when the CPU waiting for I/O (referred to as iowait below) increases, which is also a common server performance problem. Today, let’s take a look at a case of multiple processes with I/O operations and analyze this situation.

Process States #

When the iowait increases, the process is likely to be in an uninterruptible state for a long time due to not receiving a response from hardware. From the output of the ps or top command, you can find that they are both in D state, which stands for uninterruptible sleep. Speaking of process states, do you remember the various states a process can be in? Let’s review them.

top and ps are the most commonly used tools to view process states, and we’ll start with the output of the top command. The S column (Status column) shows the process state. From the example below, you can see several states such as R, D, Z, S, I, etc., but what do they mean?

$ top
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28961 root      20   0   43816   3148   4040 R   3.2  0.0   0:00.01 top
  620 root      20   0   37280  33676    908 D   0.3  0.4   0:00.01 app
    1 root      20   0  160072   9416   6752 S   0.0  0.1   0:37.64 systemd
 1896 root      20   0       0      0      0 Z   0.0  0.0   0:00.00 devapp
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.10 kthreadd
    4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/0:0H
    6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 mm_percpu_wq
    7 root      20   0       0      0      0 S   0.0  0.0   0:06.37 ksoftirqd/0

Let’s take a look at each of them:

R stands for Running or Runnable. It indicates that the process is in the CPU’s run queue and is either running or waiting to run.
D stands for Disk Sleep, which represents uninterruptible sleep. It usually means that the process is interacting with hardware and the interaction cannot be interrupted by other processes or interrupts.
Z stands for Zombie. If you have played the game “Plants vs. Zombies,” you might already know what it means. It indicates a zombie process, which means the process has actually ended, but the parent process has not yet reclaimed its resources (such as process descriptors, PIDs, etc.).
S stands for Interruptible Sleep. It represents that the process is suspended by the system while waiting for a certain event. When the event the process is waiting for occurs, it will be awakened and enter the R state.
I stands for Idle. It is used for idle kernel threads that are in uninterruptible sleep. As mentioned earlier, processes in D state can increase the system load, while processes in I state will not.

Of course, the above example does not include all possible process states. Apart from the above five states, processes also include the following two states.

The first one is T or t, which stands for Stopped or Traced. It indicates that the process is in a paused or traced state.

When a process receives the SIGSTOP signal, it will be paused (Stopped) in response. To resume its execution, you can send it the SIGCONT signal (if the process was started directly in a terminal, you need to use the fg command to bring it to the foreground).

When you use a debugger (like gdb) to debug a process and interrupt it using breakpoints, the process enters a traced state. This is also a special type of paused state, but you can use the debugger to track and control the process’s execution as needed.

The other one is X, which stands for Dead. It indicates that the process has terminated, so you won’t see it in the top or ps command anymore.

Now let’s go back to our main topic for today. Let’s first look at the uninterruptible state. This is actually to ensure the consistency of process data with hardware status, and normally, a process in uninterruptible state will end within a short time. So, we generally can ignore processes in short-term uninterruptible state.

However, if a system or hardware failure occurs, processes can remain in uninterruptible state for a long time, even causing a large number of uninterruptible processes in the system. At this point, you need to pay attention to whether there are any I/O or performance issues with the system.

Now, let’s talk about zombie processes, which is a problem that multi-process applications often encounter. Normally, when a process creates a child process, it should wait for the child process to end and reclaim its resources through system calls like wait() or waitpid(). When a child process exits, it sends the SIGCHLD signal to its parent process, so the parent process can also register a signal handler for SIGCHLD to asynchronously reclaim resources.

If the parent process fails to do this or the child process executes too quickly, causing the parent process to not have enough time to handle the child process state, the child process will become a zombie process. In other words, a father should always take responsibility for his son, from beginning to end. If the father does not act or cannot keep up, it will lead to the appearance of “delinquent teenagers.”

Usually, zombie processes last for a short period of time and disappear once their resources are reclaimed by the parent process, or when the parent process exits, they are reclaimed by the init process.

If the parent process does not handle the termination of the child processes and keeps running continuously, then the child processes will remain in a zombie state. A large number of zombie processes can deplete the PID process IDs, preventing new processes from being created, so this situation must be avoided.

Case Study #

Next, I will analyze the problems of a large number of uninterruptible and zombie processes using a case study of a multi-process application. This application is developed in C, and because the compilation and execution steps are more complicated, I have packaged it into a Docker image. This way, you only need to run a Docker container to get the simulation environment.

Your Preparation #

The following case study is still based on Ubuntu 18.04, which is also applicable to other Linux systems. The environment I used for the case study is as follows:

Machine Configuration: 2 CPUs, 8GB RAM
Pre-install tools such as docker, sysstat, and dstat, for example, apt install docker.io dstat sysstat

Here, dstat is a new performance tool that incorporates the advantages of several tools such as vmstat, iostat, and ifstat. It can simultaneously observe the CPU, disk I/O, network, and memory usage of the system.

Next, we open a terminal, SSH into the machine, and install the aforementioned tools.

Note that all the following commands are assumed to be run as the root user. If you log in to the system as a regular user, please run the command “sudo su root” to switch to the root user.

If there are any problems during the installation process, you can search online for solutions. If you can’t solve them, remember to ask me in the comments section.

Helpful tip: The core code logic of the case application is relatively simple, and you may be able to see the problem at a glance. However, the source code in actual production environments is much more complicated. Therefore, I still recommend not reading the source code before performing the analysis to avoid preconceived notions. Instead, treat it as a black box to analyze the problem based on observations. You can consider it as a practice session in your work, which will yield better results.

Operations and Analysis #

After the installation is complete, we first execute the following command to run the case application:

$ docker run --privileged --name=app -itd feisky/app:iowait

Then, enter the ps command to confirm that the case application has started correctly. If everything is fine, you should see the following output:

$ ps aux | grep /app
root      4009  0.0  0.0   4376  1008 pts/0    Ss+  05:51   0:00 /app
root      4287  0.6  0.4  37280 33660 pts/0    D+   05:54   0:00 /app
root      4288  0.6  0.4  37280 33668 pts/0    D+   05:54   0:00 /app

From this interface, we can see that multiple app processes have been started, and their states are Ss+ and D+. Here, S represents interruptible sleep state, D represents uninterruptible sleep state, which we just learned earlier. But what do the symbols s and + mean at the end? If you don’t know, it’s okay, you can check the man ps for the details. For now, remember that s indicates that the process is a session leader, and + indicates a foreground process group.

Here, two new concepts have emerged, process group and session. They are used to manage a group of related processes, which is actually quite easy to understand.

A process group represents a group of related processes, for example, each child process is a member of the group of the parent process;
A session refers to one or more process groups that share the same control terminal. For example, when we log in to a server via SSH, we will open a control terminal (TTY), which corresponds to a session. The commands we run in the terminal and their subprocesses form individual process groups. Among them, commands running in the background form background process groups, while commands running in the foreground form foreground process groups.

Now that we understand these, let’s use the top command to check the system’s resource usage:

# Press the number 1 to switch to the usage of all CPUs, observe for a while and press Ctrl+C to exit
$ top
top - 05:56:23 up 17 days, 16:45,  2 users,  load average: 2.00, 1.68, 1.39
Tasks: 247 total,   1 running,  79 sleeping,   0 stopped, 115 zombie
%Cpu0  :  0.0 us,  0.7 sy,  0.0 ni, 38.9 id, 60.5 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.7 sy,  0.0 ni,  4.7 id, 94.6 wa,  0.0 hi,  0.0 si,  0.0 st
...

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 4340 root      20   0   44676   4048   3432 R   0.3  0.0   0:00.05 top
 4345 root      20   0   37280  33624    860 D   0.3  0.0   0:00.01 app
 4344 root      20   0   37280  33624    860 D   0.3  0.4   0:00.01 app
    1 root      20   0  160072   9416   6752 S   0.0  0.1   0:38.59 systemd
...

Can you see any problems from here? Be careful and observe line by line, don’t miss anything. If you forget the meaning of any parameter, go back and review it in time.

Okay, if you already have an answer, let’s continue and see if it matches the problem I found. Here, I found four suspicious points.

First, let’s look at the average load on the first line. The average load for the past 1 minute, 5 minutes, and 15 minutes is decreasing, indicating that the average load is increasing. And the average load for the past 1 minute has already reached the number of CPUs in the system, suggesting that the system may have reached a performance bottleneck.
Next, let’s look at the Tasks on the second line. There is 1 running process, but there are quite a few zombie processes, and they are increasing continuously, indicating that some child processes are not being cleaned up when they exit.
Then let’s look at the usage of the two CPUs. The user CPU and system CPU are not high, but the iowait percentages are 60.5% and 94.6% respectively, which seems a bit abnormal.
Finally, let’s look at the information of each process. The process with the highest CPU usage is only 0.3%, which doesn’t seem high. But there are two processes in the D state, which may be waiting for I/O, but based on this information alone, we cannot determine if they are causing the high iowait.

Let’s summarize these four problems, and we can get two clear points:

First, the iowait is too high, causing the average load of the system to increase, and even reaching the number of CPUs in the system.
Second, the number of zombie processes keeps increasing, indicating that some programs failed to properly clean up the resources of child processes.

So, what should we do when we encounter these two problems? Combine the problem-solving approach we discussed earlier, think for yourself, try it out, and I’ll continue to “break it down” in the next lesson.

Summary #

Today, through simple operations, we have familiarized ourselves with several essential process states. Using the familiar commands ps or top, we can view the status of processes. These statuses include running (R), idle (I), uninterruptible sleep (D), interruptible sleep (S), zombie (Z), and stopped (T).

Among them, the uninterruptible sleep and zombie states are the focus of our study today.

The uninterruptible sleep state indicates that the process is interacting with the hardware. In order to protect the consistency of process data and hardware, the system does not allow other processes or interrupts to interrupt this process. If a process remains in the uninterruptible sleep state for a long time, it usually indicates I/O performance issues in the system.
Zombie processes represent processes that have exited, but their parent process has not yet reclaimed the resources occupied by the child process. We usually do not need to pay attention to transient zombie states, but if a process remains in the zombie state for a long time, we should be cautious as there may be an application that did not properly handle the exit of the child process.

Reflection #

Finally, I would like you to reflect on today’s homework questions and the two problems identified in the case. How would you analyze them? And how should they be resolved? You can summarize your own ideas based on the case studies we have done before and raise your own questions.

Feel free to discuss it with me in the comments, and feel free to share this article with your colleagues and friends. Let’s practice in real scenarios and improve through communication.