15 Multithreading Adjustable Operations That Lead to Context Switches

15 Multithreading Adjustable operations that lead to context switches #

Hello, I’m Liu Chao.

We often say “practice is the sole criterion for testing truth,” which is not only applicable in social development but also in technical learning.

I remember when I first joined my previous company, I happened to experience a spike in concurrent users during a flash sale event. It was the first high-concurrency test after the system was refactored and deployed. As expected, there were a large number of timeout alerts, but it was better than I anticipated. At least the system didn’t crash and need to be restarted.

Through analysis using tools, I found that the context switch rate (cs), measured in the number of context switches per second, had reached nearly 600,000, while normally it was around 50,000. Further analysis of the logs revealed a significant number of exceptions with “wait()”, which led me to suspect that the delayed processing of a large number of threads was the cause. After narrowing down the problem, I found that the issue was related to an unreasonable connection pool size setting. Eventually, I simulated the production environment configuration, adjusted the number of connections, and ran performance tests. By reducing the maximum thread count, the system’s performance improved.

From this practical experience, I learned that in concurrent programs, launching more threads does not guarantee maximum program concurrency. Setting the thread count too low will prevent the program from fully utilizing system resources, while setting the thread count too high may lead to excessive resource competition and additional system overhead due to context switching.

You see, many experiences are accumulated bit by bit. So, today I want to share with you some related content about “context switching” in the hope that you’ll gain something from it.

Understanding Context Switching #

First, we need to understand what context switching is.

In the era of single processors, the operating system was able to handle multiple concurrent tasks using multithreading. The processor allocates CPU time slices to each thread, and the threads execute tasks within their allotted time slices.

A CPU time slice is the amount of time allocated to each thread for execution, usually several tens of milliseconds. Within such a short period, thread switches occur so quickly that we don’t even notice, making it appear as if the threads are running simultaneously.

The time slice determines how long a thread can occupy the processor continuously. When a thread’s time slice is used up or it is forced to suspend due to some reason, the operating system selects another thread (which can be the same thread or a thread from another process) to take over the processor. This process of suspending one thread and selecting another to start or continue execution is called context switching.

Specifically, when a thread is suspended and loses the privilege to use the processor, it is “switched out”. When another thread is selected to occupy the processor and starts or continues execution, it is “switched in”. During this process of switching out and switching in, the operating system needs to save and restore the corresponding progress information, which is the “context”.

So, what does the context include? Specifically, it includes the storage contents of registers and the instruction contents stored in the program counter. The CPU registers are responsible for storing tasks that have been executed, are currently being executed, and will be executed. The program counter is responsible for storing the current position of the CPU executing the instruction as well as the position of the next instruction to be executed.

In situations where there are more than one CPU, the operating system allocates the CPUs to thread tasks in a round-robin manner. In such cases, context switching becomes more frequent, and there may also be context switches across CPUs, which are more costly compared to context switches within a single core.

Causes of Thread Context Switching #

In an operating system, context switching can be categorized as either context switching between processes or context switching between threads. In multi-threaded programming, we mainly deal with the performance issues caused by context switching between threads. Let’s take a closer look at the reasons behind thread context switching. But before we begin, let’s first examine the lifecycle states of a Java thread.

Based on the diagram, a thread has five main states: “NEW”, “RUNNABLE”, “RUNNING”, “BLOCKED”, and “DEAD”.

During the execution process, the transition of a thread from RUNNABLE to non-RUNNABLE represents a thread context switch.

A thread transitions from the RUNNING state to the BLOCKED state, then from the BLOCKED state to the RUNNABLE state, and is later selected by the scheduler to execute. This is the process of a context switch.

When a thread transitions from the RUNNING state to the BLOCKED state, it is considered a pause for the thread. After the thread is paused and switched out, the operating system saves the corresponding context so that when the thread reenters the RUNNABLE state, it can continue executing from its previous progress.

When a thread transitions from the BLOCKED state to the RUNNABLE state, it is considered a wakeup for the thread. The thread then retrieves the previously saved context and continues execution.

By understanding the running states of threads and the transitions between these states, we can conclude that context switching in multithreading is actually caused by the mutual transitions between the two running states of threads.

Now, during thread execution, what triggers the transition of a thread’s state from RUNNING to BLOCKED or from BLOCKED to RUNNABLE?

We can analyze this in two scenarios: one is a context switch triggered by the program itself, which we call spontaneous context switching, and the other is a context switch induced by the system or the virtual machine, which we call non-spontaneous context switching.

Spontaneous context switching refers to thread switching caused by Java program invocations. In multithreaded programming, the following methods or keywords are often associated with spontaneous context switching:

sleep()
wait()
yield()
join()
park()
synchronized
lock

Non-spontaneous context switching refers to threads being forcibly switched due to scheduling reasons. Common causes include: the assigned time slice for a thread is depleted, garbage collection in the virtual machine, or issues related to execution priorities.

Let’s focus on why “garbage collection in the virtual machine can cause context switching”. In the Java virtual machine, memory for objects is allocated from the heap. During program execution, new objects are continuously created, and if old objects are not recycled after use, the heap memory will be quickly depleted. The Java virtual machine provides a garbage collection mechanism to recycle objects that are no longer in use, ensuring sustainable allocation of heap memory. However, the use of this garbage collection mechanism can lead to stop-the-world events, which essentially represent thread pause behavior.

Discover Context Switching #

We often talk about the system overhead caused by context switching. Is the performance impact really that bad? How can we monitor context switching? Where exactly does the overhead of context switching occur? Next, I will provide a code snippet to compare the speed of serial execution and concurrent execution, and then answer these questions one by one.

public class DemoApplication {
       public static void main(String[] args) {
              // Run multi-threading
              MultiThreadTester test1 = new MultiThreadTester();
              test1.Start();
              // Run single-threading
              SerialTester test2 = new SerialTester();
              test2.Start();
       }


       static class MultiThreadTester extends ThreadContextSwitchTester {
              @Override
              public void Start() {
                     long start = System.currentTimeMillis();
                     MyRunnable myRunnable1 = new MyRunnable();
                     Thread[] threads = new Thread[4];
                     // Create multiple threads
                     for (int i = 0; i < 4; i++) {
                           threads[i] = new Thread(myRunnable1);
                           threads[i].start();
                     }
                     for (int i = 0; i < 4; i++) {
                           try {
                                  // Wait until all threads finish
                                  threads[i].join();
                           } catch (InterruptedException e) {
                                  // TODO Auto-generated catch block
                                  e.printStackTrace();
                           }
                     }
                     long end = System.currentTimeMillis();
                     System.out.println("multi thread exec time: " + (end - start) + "s");
                     System.out.println("counter: " + counter);
              }
              // Create a class implementing Runnable
              class MyRunnable implements Runnable {
                     public void run() {
                           while (counter < 100000000) {
                                  synchronized (this) {
                                         if(counter < 100000000) {
                                                increaseCounter();
                                         }

                                  }
                           }
                     }
              }
       }

      // Create a single thread
       static class SerialTester extends ThreadContextSwitchTester{
              @Override
              public void Start() {
                     long start = System.currentTimeMillis();
                     for (long i = 0; i < count; i++) {
                           increaseCounter();
                     }
                     long end = System.currentTimeMillis();
                     System.out.println("serial exec time: " + (end - start) + "s");
                     System.out.println("counter: " + counter);
              }
       }

       // Parent class
       static abstract class ThreadContextSwitchTester {
              public static final int count = 100000000;
              public volatile int counter = 0;
              public int getCount() {
                     return this.counter;
              }
              public void increaseCounter() {
                     this.counter += 1;
              }
              public abstract void Start();
       }
}

After execution, let’s take a look at the time test results for both:

By comparing the data, we can see that the serial execution speed is faster than the concurrent execution speed. This is because context switching between threads causes additional overhead. The use of the synchronized keyword leads to resource contention, which in turn results in context switching. However, even without using the synchronized keyword, the speed of concurrent execution cannot exceed that of serial execution, because multi-threading still involves context switching. The designs of Redis and Node.js highlight the advantages of single-threaded serial execution.

On a Linux system, you can use the vmstat command provided by the Linux kernel to monitor the frequency of context switching in the system during the execution of a Java program. The cs field is shown in the following figure:

If you want to monitor the context switching of a specific application, you can use the pidstat command to monitor the context switch of the specified process.

Since Windows does not have tools like vmstat, on Windows, we can use Process Explorer to view the number of context switches between threads during program execution.

As for where exactly the overhead occurs in the context switching process, the summary is as follows:

The operating system saves and restores the context.
The scheduler performs thread scheduling.
The processor cache needs to be reloaded.
Context switching may also result in the entire cache being flushed, leading to time overhead.

Summary #

Context switching is the process of one thread being paused and another thread taking over the processor to execute tasks. Spontaneous and non-spontaneous operations in the system and Java programs can cause context switching, resulting in system overhead.

Having more threads does not necessarily make the system run faster. So, when should we use single-threading and when should we use multi-threading in situations with a large concurrency?

In general, single-threading can be used when the logic is relatively simple and the speed is very fast. For example, as we mentioned earlier, Redis can quickly retrieve values from memory without considering the blocking issue caused by I/O bottlenecks. On the other hand, in scenarios with relatively complex logic, long waiting time, or large amounts of computation, I recommend using multi-threading to improve the overall performance of the system. For example, file read and write operations, image processing, and big data analysis during the NIO era.

Thought Questions #

Above, we mainly discussed the context switch of multithreading. I mentioned the context switch between processes when talking about classification earlier. Do you know if there will be a context switch between processes when using Synchronized in multithreading? Specifically, where would it occur?