04 Fundamentals the Meaning of Context Switch in CPU Part Two

04 Fundamentals The Meaning of Context Switch in CPU Part Two #

Hello, I’m Ni Pengfei.

In the previous section, I explained the working principle of CPU context switch. Let’s quickly review. CPU context switch is a core functionality that ensures the normal operation of the Linux system. Depending on the different scenarios, it can be divided into process context switch, thread context switch, and interrupt context switch. You should go over the specific concepts and differences again in your mind. If you have forgotten, please refer back to the previous article.

Today, let’s continue to see how to analyze the problem of CPU context switch.

How to View the System’s Context Switching Situation #

As we have learned before, excessive context switching consumes CPU time in saving and restoring data such as registers, kernel stacks, and virtual memory. This shortens the actual running time of processes and becomes a major cause of significant performance degradation.

Since context switching has such a big impact on system performance, you must be eager to know how to view the context switching. Here, we can use the tool vmstat to query the system’s context switching situation.

vmstat is a commonly used system performance analysis tool, mainly used to analyze the system’s memory usage, as well as the number of CPU context switches and interrupts.

For example, the following is an example of using vmstat:

# Output 1 set of data every 5 seconds
$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 7005360  91564 818900    0    0     0     0   25   33  0  0 100  0  0

Let’s take a look at this result together. You can try to interpret the meaning of each column yourself. Here, I want to emphasize the four columns that need special attention:

cs (context switch) is the number of context switches per second.
in (interrupt) is the number of interrupts per second.
r (Running or Runnable) is the length of the ready queue, which is the number of processes currently running or waiting for the CPU.
b (Blocked) is the number of processes in an uninterruptible sleep state.

As you can see, in this example, the number of context switches cs is 33, and the number of interrupts in is 25, while the length of the ready queue r and the number of processes in an uninterruptible sleep state b are both 0.

vmstat only provides the overall context switching situation of the system. To view the detailed information of each process, you need to use the pidstat we mentioned earlier. If you add the -w option to it, you can view the context switching of each process.

For example:

# Output 1 set of data every 5 seconds
$ pidstat -w 5
Linux 4.15.0 (ubuntu)  09/23/18  _x86_64_  (2 CPU)

08:18:26      UID       PID   cswch/s nvcswch/s  Command
08:18:31        0         1      0.20      0.00  systemd
08:18:31        0         8      5.40      0.00  rcu_sched
...

There are two columns in this result that we should pay attention to. One is cswch, which represents the number of voluntary context switches per second, and the other is nvcswch, which represents the number of non-voluntary context switches per second.

You must remember these two concepts firmly because they indicate different performance issues:

Voluntary context switch refers to the context switch that occurs when a process cannot acquire the required resources. For example, when I/O, memory, and other system resources are insufficient, voluntary context switches occur.
Non-voluntary context switch refers to the context switch that occurs when a process is forcibly scheduled by the system due to the expiration of its time slice or other reasons. For example, when a large number of processes are competing for the CPU, non-voluntary context switches are more likely to occur.

Case Study #

After understanding how to view these metrics, another question arises: how many context switches is considered normal? Don’t rush to find the answer. Similarly, let’s first look at a case of context switches. By analyzing and finding the standard through practical exercises, you can do it yourself.

Your Preparation #

In today’s case, we will use sysbench to simulate system multitasking context switching.

sysbench is a multi-threaded benchmarking tool that is generally used to evaluate the database load under different system parameters. Of course, in this case, we will treat it as an abnormal process to simulate the problem of excessive context switches.

The following case is based on Ubuntu 18.04, but it is also applicable to other Linux systems. The case environment I used is as follows:

Machine configuration: 2 CPUs, 8GB memory
Pre-installed packages: sysbench and sysstat, which can be installed with the following command apt install sysbench sysstat

Before starting the formal operation, you need to open three terminals, log in to the same Linux machine, and install the two software packages mentioned above. If you encounter any problems during the installation, you can try to solve them by googling. If you still encounter problems, please write down your situation in the comments area.

Also, note that all the commands below are assumed to be run as root. So, if you log in to the system with a normal user, please remember to run the command sudo su root to switch to the root user first.

After the installation is completed, you can use vmstat to check the number of context switches in an idle system:

# Output 1 set of data after a 1-second interval
$ vmstat 1 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 6984064  92668 830896    0    0     2    19   19   35  1  0 99  0  0

Here, you can see that the current number of context switches (cs) is 35, while the number of interrupts (in) is 19. The r and b values are both 0. Because I am not running any other tasks right now, they represent the number of context switches in an idle system.

Operation and Analysis #

Next, we will start the practical operation.

First, run sysbench in the first terminal to simulate the bottleneck of system multitasking scheduling:

# Run a benchmark test with 10 threads for 5 minutes to simulate the problem of multitasking switching
$ sysbench --threads=10 --max-time=300 threads run

Then, run vmstat in the second terminal to observe the context switch situation:

# Output 1 set of data every 1 second (needs to be terminated with Ctrl+C)
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 6  0      0 6487428 118240 1292772    0    0     0     0 9019 1398830 16 84  0  0  0
 8  0      0 6487428 118240 1292772    0    0     0     0 10191 1392312 16 84  0  0  0

You should notice that the number of context switches in the cs column has suddenly increased from the previous 35 to 1.39 million. At the same time, pay attention to several other indicators:

r column: The length of the ready queue has reached 8, which far exceeds the number of CPUs in the system, which is 2. Therefore, there must be a large amount of CPU competition.
us (user) and sy (system) columns: The CPU usage of these two columns has added up to 100%, of which the system CPU usage, that is, the sy column, reaches 84%, indicating that the CPU is mainly occupied by the kernel.
in column: The number of interrupts has also increased to about 10,000, indicating that interrupt handling is also a potential problem.

Based on these indicators, we can know that the long ready queue of the system, which means that there are too many processes running and waiting for the CPU, has caused a large number of context switches, and the context switches have caused the CPU usage to increase.

So what processes are causing these problems?

Let’s continue the analysis and use pidstat in the third terminal to observe the CPU usage and process context switch situation:

# Output 1 set of data every 1 second (needs to be terminated with Ctrl+C)
# The -w parameter indicates the output of process switch indicators, while the -u parameter indicates the output of CPU usage indicators
$ pidstat -w -u 1
08:06:33      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
08:06:34        0     10488   30.00  100.00    0.00    0.00  100.00     0  sysbench
08:06:34        0     26326    0.00    1.00    0.00    0.00    1.00     0  kworker/u4:2

08:06:33      UID       PID   cswch/s nvcswch/s  Command
08:06:34        0         8     11.00      0.00  rcu_sched
08:06:34        0        16      1.00      0.00  ksoftirqd/1
08:06:34        0       471      1.00      0.00  hv_balloon
08:06:34        0      1230      1.00      0.00  iscsid
08:06:34        0      4089      1.00      0.00  kworker/1:5
08:06:34        0      4333      1.00      0.00  kworker/0:3
08:06:34        0     10499      1.00    224.00  pidstat
08:06:34        0     26326    236.00      0.00  kworker/u4:2
08:06:34     1000     26784    223.00      0.00  sshd

From the output of pidstat, you can see that the increase in CPU usage is indeed caused by sysbench, and its CPU usage has reached 100%. However, the context switches come from other processes, including pidstat, which has the highest involuntary context switch rate, and kernel threads kworker and sshd, which have the highest voluntary context switch rate.

However, if you are observant, you must have noticed something strange: the total number of context switches outputted by pidstat is only a few hundred, which is much smaller than the 1.39 million from vmstat. What’s going on? Could it be a problem with the tool?

Don’t worry, before doubting the tool, let’s think again about the various scenarios of context switching mentioned earlier. One of them mentions that the basic unit of Linux scheduling is actually threads, and the scenario simulated by sysbench is also the scheduling problem of threads. So, did pidstat ignore the data of threads?

By running man pidstat, you will find that pidstat shows process-level data by default, and it only outputs thread-level data when the -t parameter is added.

So, we can stop the previous pidstat command by pressing Ctrl+C in the third terminal, then try again with the -t parameter:

# Output a group of data every 1 second (need to press Ctrl+C to end)
# -wt parameter means outputting thread context switching indicators
$ pidstat -wt 1
08:14:05      UID      TGID       TID   cswch/s nvcswch/s  Command
...
08:14:05        0     10551         -      6.00      0.00  sysbench
08:14:05        0         -     10551      6.00      0.00  |__sysbench
08:14:05        0         -     10552  18911.00 103740.00  |__sysbench
08:14:05        0         -     10553  18915.00 100955.00  |__sysbench
08:14:05        0         -     10554  18827.00 103954.00  |__sysbench
...

Now you can see that although the context switches of the sysbench process (i.e., the main thread) do not seem to be many, its sub-threads have a lot of context switches. It seems that the culprit for context switches is still too many sysbench threads. We have found the root cause of the increased number of context switches. Does that mean we can end here?

Of course not. I don’t know if you remember, but earlier when observing system metrics, besides the sudden increase in context switch rate, there was another metric that had a significant change. Yes, it was the interrupt count. The interrupt count has also risen to 10,000, but it is still unclear which type of interrupt has increased. We need to continue to investigate and find the source.

Since it’s an interrupt, we all know that it only occurs in kernel mode. pidstat is just a performance analysis tool for processes and does not provide any detailed information about interrupts. How can we determine the type of interrupt that occurred?

That’s right, we can read from the /proc/interrupts read-only file. /proc is actually a virtual filesystem in Linux used for communication between the kernel space and user space. /proc/interrupts is a part of this communication mechanism and provides information about interrupt usage.

Let’s go back to the third terminal. Stop the previous pidstat command with Ctrl+C, and then run the following command to observe the changes in interrupts:

# The -d option highlights the changing area
$ watch -d cat /proc/interrupts
           CPU0       CPU1
...
RES:    2450431    5279697   Rescheduling interrupts
...

Observe for a while and you should notice that the Rescheduling interrupts (RES) changes the fastest. This type of interrupt indicates waking up idle CPUs to schedule new tasks. It is a mechanism used by the scheduler in multiprocessor systems (SMP) to distribute tasks to different CPUs and is often referred to as Inter-Processor Interrupts (IPI).

Therefore, the increase in interrupts here is due to excessive task scheduling, which is consistent with the analysis of the number of context switches earlier.

Through this case, you should also realize the benefits of comparing multiple tools and metrics. If we had only used pidstat for observation from the beginning, we would not have noticed these significant context switching threads at all.

Now let’s go back to the initial question: how many context switches per second are considered normal?

This value actually depends on the CPU performance of the system itself. In my opinion, if the number of context switches is relatively stable, anywhere from a few hundred to around 10,000 can be considered normal. However, if the number of context switches exceeds 10,000 or increases by an order of magnitude, it is likely that there is a performance issue.

At this point, you need to further analyze based on the type of context switch. For example:

If voluntary context switches increase, it indicates that processes are waiting for resources, possibly due to I/O or other issues.
If involuntary context switches increase, it indicates that processes are being forcibly scheduled, i.e., competing for the CPU, which suggests that the CPU has become a bottleneck.
If the interrupt count increases, it indicates that the CPU is occupied by interrupt handlers, and you need to analyze the specific interrupt types by checking the /proc/interrupts file.

Summary #

Today, I have explained to you the analysis approach to the context switching problem using a sysbench case study. When faced with the issue of excessive context switching, we can use tools such as vmstat, pidstat, and /proc/interrupts to assist in troubleshooting the root cause of performance issues.

Reflection #

Finally, I would like to discuss with you how you analyzed and investigated the context switching problem before. You can summarize your thoughts based on the content of these two sections and your own practical experience.

Feel free to discuss with me in the comments section, and also feel free to share this article with your colleagues and friends. Let’s practice in real scenarios and learn through communication.