18 How to Set Thread Pool Size

18 How to set thread pool size #

Hello, I am Liu Chao.

Do you remember in Lesson 16 when I mentioned that setting the thread count of a thread pool too high will lead to intense thread competition? Today, let me add that if the thread count is set too low, the system will also fail to fully utilize computer resources. So how do we set it without affecting system performance?

In fact, there are methods to set the thread pool, it is not determined by simple estimation. Today, let’s take a look at what calculation methods can be reused and what relationships exist between the parameters in the thread pool.

Thread Pool Principle #

Before we start optimizing, let’s take a look at the implementation principle of a thread pool, which will help you better understand the content that follows.

In the thread model of HotSpot VM, Java threads are mapped one-to-one to kernel threads. When Java uses threads to execute programs, a kernel thread needs to be created. When the Java thread is terminated, the corresponding kernel thread will also be recycled. Therefore, the creation and destruction of Java threads will consume certain computer resources, thus increasing the performance overhead of the system.

In addition, creating a large number of threads also brings performance issues to the system, because both memory and CPU resources will be competed for by threads. If not handled properly, problems such as memory overflow and CPU overload may occur.

To solve the above two types of problems, Java provides the concept of a thread pool. For business scenarios where threads are frequently created, the thread pool can create a fixed number of threads, and at the operating system level, lightweight processes will map these threads to the kernel.

The thread pool can improve thread reuse, while fixing the maximum thread usage to prevent unlimited thread creation. When a program submits a task that requires a thread, it will search the thread pool to see if there are idle threads. If there are, it will directly use the threads in the thread pool to do the work. If there are none, it will check whether the number of threads created so far exceeds the maximum number of threads. If it has not exceeded, a new thread will be created. If it has exceeded, it will either wait in line or throw an exception directly.

Executor Thread Pool Framework #

Java originally provided ThreadPool to implement thread pool. In order to better implement user-level thread scheduling and help developers with multi-threaded development, Java provides a framework called Executor.

This framework includes two core thread pools: ScheduledThreadPoolExecutor and ThreadPoolExecutor. The former is used for executing tasks at scheduled times, while the latter is used for executing submitted tasks. Since the core principles of these two thread pools are the same, let’s focus on how the ThreadPoolExecutor class implements the thread pool.

Executors implements four types of ThreadPoolExecutor using the factory pattern:

img

Executors provides four thread pools implemented using the factory classes. When using them, we need to consider the actual scenarios in the production environment. However, I don’t recommend using them because choosing to use the factory classes provided by Executors will ignore many parameter settings of the thread pool. Once the factory class sets the default parameters, it is easy to cause problems with parameter optimization, resulting in performance issues or resource waste.

Here, I suggest using ThreadPoolExecutor to customize a thread pool. After entering the four factory classes, we can see that all classes except newScheduledThreadPool are implemented using the ThreadPoolExecutor class. You can see how this method is implemented by simply looking at the code below:

    public ThreadPoolExecutor(int corePoolSize, // the number of core threads in the thread pool
                              int maximumPoolSize, // the maximum number of threads in the thread pool
                              long keepAliveTime,  // the maximum time that excess idle threads will wait for new tasks before terminating
                              TimeUnit unit,  // the time unit for the keepAliveTime argument
                              BlockingQueue<Runnable> workQueue,  // the queue to hold the waiting tasks
                              ThreadFactory threadFactory, // the factory to create new threads, usually use the default one
                              RejectedExecutionHandler handler) // the handler to use when the task cannot be accepted by the thread pool

We can also understand the relationship between the parameters in the thread pool through the following figure:

img

From the above figure, we can see that the thread pool has two sets of parameters for the number of threads: the core pool size and the maximum pool size. By default, there are no threads in the thread pool after it is created. Threads are created to execute tasks only when there are tasks.

But there is one exception when prestartAllCoreThreads() or prestartCoreThread() methods are called. It can create the number of threads equal to the core pool size in advance, which is called pre-warming. This method is often used in flash sale systems.

When the number of created threads equals the corePoolSize, the submitted tasks will be added to the blocking queue. When the queue is full, threads will be created to execute tasks until the number of threads in the thread pool equals the maximumPoolSize.

When the number of threads has reached the maximumPoolSize, new submitted tasks cannot be added to the waiting queue, and non-core threads cannot be created to execute tasks. If we don’t set a rejection policy for the thread pool, the thread pool will throw a RejectedExecutionException, meaning that the thread pool refuses to accept the task.

When the number of threads created in the thread pool exceeds the corePoolSize, if some threads have finished processing tasks and have not been assigned new tasks after waiting for keepAliveTime, these threads will be recycled. When the thread pool recycles threads, there is no distinction between “core threads” and “non-core threads” until the number of threads in the thread pool equals the corePoolSize set by the parameter, and then the recycling process will stop.

Even for corePoolSize threads, in some non-core business thread pools, if they occupy the threads for a long time, it may also affect the core business thread pool. At this time, the threads that have not been assigned tasks need to be recycled.

We can set the allowCoreThreadTimeOut option to require the thread pool to recycle all threads, including “core threads”, that have no tasks assigned after waiting for keepAliveTime.

We can understand the thread allocation process of the thread pool through the following figure:

img

Calculating the Number of Threads #

After understanding the implementation principles and frameworks of thread pools, we can start practicing optimizing the settings of thread pools.

We know that the environment is variable, so it is actually impossible to set an absolutely accurate number of threads. However, we can calculate a reasonable number of threads based on practical operational factors to avoid performance issues caused by inappropriate thread pool settings. Let’s take a look at the specific calculation methods.

Generally, multi-threaded tasks can be divided into CPU-bound tasks and I/O-bound tasks, and the methods for calculating the number of threads are different for different types of tasks.

CPU-bound tasks: These tasks mainly consume CPU resources. The number of threads can be set as N (number of CPU cores) + 1. The extra thread beyond the number of CPU cores is used to prevent the impact caused by occasional page faults or pauses in the tasks due to other reasons. Once a task pauses, the CPU becomes idle, and in this case, the extra thread can make full use of the CPU’s idle time.

Let’s verify the feasibility of this method with an example. By observing the performance of CPU-bound tasks under different numbers of threads, we can draw a conclusion. You can click Github to download and run the test locally:

public class CPUTypeTest implements Runnable {

	// The overall execution time, including the time spent in the queue waiting
	List<Long> wholeTimeList;
	// The actual execution time
	List<Long> runTimeList;

	private long initStartTime = 0;

	/**
	 * Constructor
	 * @param runTimeList
	 * @param wholeTimeList
	 */
	public CPUTypeTest(List<Long> runTimeList, List<Long> wholeTimeList) {
		initStartTime = System.currentTimeMillis();
		this.runTimeList = runTimeList;
		this.wholeTimeList = wholeTimeList;
	}

	/**
	 * Prime number check
	 * @param number
	 * @return
	 */
	public boolean isPrime(final int number) {
		if (number <= 1)
			return false;


		for (int i = 2; i <= Math.sqrt(number); i++) {
			if (number % i == 0)
				return false;
		}
		return true;
	}

	/**
	 * Count prime numbers
	 * @param number
	 * @return
	 */
	public int countPrimes(final int lower, final int upper) {
		int total = 0;
		for (int i = lower; i <= upper; i++) {
			if (isPrime(i))
				total++;
		}
		return total;
	}

	public void run() {
		long start = System.currentTimeMillis();
		countPrimes(1, 1000000);
		long end = System.currentTimeMillis();


		long wholeTime = end - initStartTime;
		long runTime = end - start;
		wholeTimeList.add(wholeTime);
		runTimeList.add(runTime);
		System.out.println("Time spent by a single thread: " + (end - start));
	}
}

The running time of the test code on a 4-core Intel i5 CPU machine changes as follows:

img

From the above, it can be concluded that when the number of threads is too small, a large number of requests at the same time will be blocked in the thread queue and wait for execution. At this time, the CPU is not fully utilized. When the number of threads is too large, the created threads will compete for CPU resources, which will result in a large number of context switches and increase the execution time of the threads, affecting the overall execution efficiency. Tests have shown that a number of threads between 4 and 6 are the most suitable.

I/O-bound tasks: In this type of task, most of the time is spent on I/O interactions, and threads do not occupy the CPU for processing during the I/O processing period. So the CPU can be released to be used by other threads. Therefore, in the application of I/O-bound tasks, we can configure more threads, and the specific calculation method is 2N.

Once again, let’s verify whether this formula can be standardized with an example:

public class IOTypeTest implements Runnable {

    // Overall execution time, including time spent in the queue
    Vector<Long> wholeTimeList;
    // Actual execution time
    Vector<Long> runTimeList;

    private long initStartTime = 0;

    /**
     * Constructor
     * @param runTimeList
     * @param wholeTimeList
     */
    public IOTypeTest(Vector<Long> runTimeList, Vector<Long> wholeTimeList) {
        initStartTime = System.currentTimeMillis();
        this.runTimeList = runTimeList;
        this.wholeTimeList = wholeTimeList;
    }

    /**
     * IO operation
     * @param number
     * @return
     * @throws IOException
     */
    public void readAndWrite() throws IOException {
        File sourceFile = new File("D:/test.txt");
        // Create input stream
        BufferedReader input = new BufferedReader(new FileReader(sourceFile));
        // Read from the source file, and write to the new file
        String line = null;
        while((line = input.readLine()) != null){
            //System.out.println(line);
        }
        // Close input and output streams
        input.close();
    }

    public void run() {
        long start = System.currentTimeMillis();
        try {
            readAndWrite();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        long end = System.currentTimeMillis();


        long wholeTime = end - initStartTime;
        long runTime = end - start;
        wholeTimeList.add(wholeTime);
        runTimeList.add(runTime);
        System.out.println("Time spent by a single thread: " + (end - start));
    }
}

Note: Since the test code reads a 2MB file, which involves large memory, before running, we need to adjust the heap memory space of the JVM: -Xms4g -Xmx4g, to avoid frequent FullGC and affect the test results.

img

From the test results, we can see the time spent by each thread. When the number of threads is 8, the average execution time per thread is best, which is similar to the result obtained by our calculation formula.

After learning about the two methods for calculating threads in the two extreme cases, you may still wonder, in normal application scenarios, we often don’t encounter these two extreme situations. So when encountering some routine business operations, such as implementing scheduled message push to users through a thread pool, how should we set the number of threads in the thread pool?

At this time, we can refer to the following formula to calculate the number of threads:

Number of threads = N (number of CPU cores) * (1 + WT (thread waiting time) / ST (thread time running time))

We can use the VisualVM tool provided with JDK to view the WT/ST ratio. The following example is based on a pure CPU calculation example, where we can see:

WT (thread waiting time) = 36788ms [total thread running time] - 36788ms[ST (thread time running time)] = 0
Number of threads = N (number of CPU cores) * (1 + 0 [WT (thread waiting time)] / 36788ms[ST (thread time running time)]) = N (number of CPU cores)

This is similar to the result we obtained from the calculation formula N+1 based on CPU-intensive calculations.

img

To summarize, we can select one suitable from the “N+1” and “2N” formulas based on our own business scenarios and calculate an approximate number of threads. Then, through actual load testing, gradually adjust towards “increasing the number of threads” and “decreasing the number of threads”, and observe the overall processing time changes, finally determining a specific number of threads.

Summary #

Today we mainly learned about the implementation principle of thread pools. Creating and consuming Java threads can incur performance overhead on the system, so Java provides thread pools to reuse threads and improve program concurrency efficiency.

Java implements the 1:1 thread model by combining user threads with kernel threads. Java sets thread scheduling and management in user space and provides an Executor framework to help developers improve efficiency. The Executor framework not only includes thread pool management, but also provides thread factories, queues, and rejection policies. It can be said that the Executor framework provides a complete architecture for concurrent programming.

The number of threads in the thread pool is set differently in different business scenarios and different configurations of deployed machines. The setting should not be too large or too small. It should be calculated to determine an approximate value and then calculate a reasonable number of threads through actual performance testing.

To improve the processing capacity of the thread pool, we must first ensure a reasonable number of threads, that is, ensuring the maximum utilization of CPU processing threads. On this premise, we can increase the size of the thread pool queue and cache threads that are not processed in time. When setting the cache queue, we should try to use a bounded queue to prevent memory overflow caused by an excessively large queue.

Thought Question #

In a program, there are not only parallel sections of code, but also serial sections of code. So when there are both serial and parallel operations in a program, is optimizing the parallel operations the key to optimizing the system?