Q& a Session Module Three Hot Questions Answered

This is a YAML front matter commonly used in Markdown files. It specifies the weight of an item as 47.

Q&A session Module three hot questions answered #

Hello, I am Liu Chao.

Without realizing it, we have finished discussing “Multi-thread Performance Optimization”. Today, I will answer two major questions posed by students in this module. The first one is about command troubleshooting tools for monitoring abnormal context switches, and the second one is about the content of BlockingQueue.

I also welcome you to leave comments actively, let me know what you want to know, or express your confusion, so that we can discuss it together. Now, let’s dive right into today’s topic.

Using system commands to view context switches #

In lecture 15, I mentioned context switches and mentioned a few tools used for monitoring them. Due to space limitations, I didn’t provide detailed explanations. Today, I will supplement and summarize several commonly used tools for you.

1. Linux command line tool - vmstat command #

vmstat is a functional monitoring tool that allows you to specify sampling intervals and counts. We can use it to monitor the context switching of processes.

img

The command vmstat 1 3 means collecting performance metrics once per second, for a total of 3 times. The following are explanations for each performance metric in the image:

  • procs r: the number of processes waiting to run b: the number of processes in uninterruptible sleep state
  • memory swpd: virtual memory usage free: free memory buff: memory used as buffers cache: cache size
  • swap si: swap pages brought in from disk to memory so: swap pages sent from memory to disk
  • io bi: blocks sent to a block device bo: blocks received from a block device
  • system in: interrupts per second cs: context switches per second
  • cpu us: CPU time in user mode sy: CPU time in system mode id: idle time wa: waiting for I/O time st: time stolen from a virtual machine’s run-time

2. Linux command line tool - pidstat command #

With the vmstat command mentioned above, we can only observe which process has abnormal context switches. But what if we want to check which thread has abnormal context switches?

The pidstat command can help us monitor the context switching of specific threads. pidstat is a component of Sysstat, and it is a powerful performance monitoring tool. We can install this monitoring component by using the command yum install sysstat.

By running the command pidstat -help, we can see several commonly used parameters for monitoring thread performance:

img

Common parameters:

  • -u: Default parameter, shows CPU usage for each process
  • -r: Shows memory usage for each process
  • -d: Shows I/O usage for each process
  • -w: Shows context switches for each process
  • -p: Specifies the process ID
  • -t: Shows thread statistics within each process

First, by running the command pidstat -w -p pid, we can see the context switches of the process:

img

  • cswch/s: voluntary context switches per second
  • nvcswch/s: involuntary context switches per second

Then, by running the command pidstat -w -p pid -t, we can see the context switches of specific threads:

img

3. JDK tool - jstack command #

To check specifically for abnormal context switches in threads, we can also use the jstack command to view the running status of the thread stack. jstack is a built-in thread stack analysis tool in the JDK. With this command, you can view or export thread stack information from Java applications.

The most commonly used function of jstack is to use the command jstack pid to view thread stack information. It is usually used in combination with pidstat -p pid -t to view the specific status of threads, and it is also frequently used to troubleshoot deadlock exceptions.

img

In each thread stack information, you can view the thread ID, thread status (wait, sleep, running, etc.), and whether it holds a lock, etc.

You can use the command jstack 16079 > /usr/dump to dump the thread stack information log, and then open the dump file. By examining the thread status changes, you can find the specific reason for the abnormal context switches. For example, if there are a large number of threads in the BLOCKED state, you need to immediately analyze the code to find the cause.

Multi-threaded Queue #

For the first question discussed, we have completed the summary of a context switch troubleshooting tool. Now, let’s move on to the second question, which is about the highly requested content regarding blockingQueue in Lecture 17.

In Java multi-threaded applications, especially in thread pools, the usage of queues is very high. Java provides thread-safe queues, which are divided into blocking queues and non-blocking queues.

1. Blocking Queues #

Let’s first take a look at blocking queues. A blocking queue is well-suited for supporting the mutual waiting between producers and consumers in the producer-consumer pattern. When the queue is empty, the consumer thread will block and wait until the queue is not empty. When the queue is full, the producer thread will block until the queue is not full.

Blocking queues are also used in Java thread pools. When the number of created threads exceeds the core thread pool size, new tasks will be put into the blocking queue. We can choose which type of blocking queue to use based on our business needs. The commonly used blocking queue types are as follows:

  • ArrayBlockingQueue: A bounded blocking queue implemented based on an array structure, sorting elements according to the FIFO (First-In-First-Out) principle. It uses ReentrantLock and Condition to ensure thread safety.
  • LinkedBlockingQueue: A blocking queue implemented based on a linked list structure, also sorting elements according to the FIFO principle. It uses ReentrantLock and Condition to ensure thread safety. It usually has higher throughput compared to ArrayBlockingQueue.
  • PriorityBlockingQueue: An unbounded blocking queue with priorities, implemented based on a binary heap structure (maximum value of Integer.MAX_VALUE - 8). The queue doesn’t implement sorting, but whenever there is a data change, the minimum or maximum data will be placed on the top of the heap. This queue also uses ReentrantLock and Condition to ensure thread safety.
  • DelayQueue: An unbounded blocking queue that supports delayed element retrieval. It is an extension of PriorityBlockingQueue, but implements the Delay interface.
  • SynchronousQueue: A blocking queue that does not store multiple elements. Each time data is put into the queue, it must wait for the corresponding consumer to take the data before it can put more data. This queue manages elements using two modes: a FIFO queue and a LIFO stack. The mode can be specified through the constructor.

The Java thread pool Executors also implements the following four types of ThreadPoolExecutor corresponding to the above queues. Details are as follows:

img

2. Non-blocking Queues #

The commonly used thread-safe non-blocking queue is ConcurrentLinkedQueue. It is an unbounded thread-safe queue (FIFO) implemented based on a linked list structure, using the CAS (Compare-and-Swap) optimistic lock to ensure thread safety.

Now, let’s analyze the construction, enqueue, and dequeue implementations of this queue through the source code.

Construction: ConcurrentLinkedQueue consists of head and tail nodes. Each node (Node) consists of an item and a reference (next) to the next node. Nodes are connected through the next reference to form a linked list structure. When the queue is initialized, the head node stores a null element and the tail node is set to the head node.

public ConcurrentLinkedQueue() {
   head = tail = new Node<E>(null);
}
 
private static class Node<E> {
        volatile E item;
        volatile Node<E> next;
            .
            .
}

Enqueue: When a thread enqueues a data, it encapsulates the data into a Node and first obtains the tail node of the queue. After confirming that the next value of the tail node is null, it uses CAS to set the next value of the new tail node to the new node. At this point, p != t, which means the setting of the next value is successful. Then, it uses CAS to set the tail node to the current node.

public boolean offer(E e) {
        checkNotNull(e);
        // Create the enqueued node
        final Node<E> newNode = new Node<E>(e);
        // t, p are tail nodes, initially equal, using the fail and retry method until enqueuing is successful 
        for (Node<E> t = tail, p = t;;) {
            // Get the next node of the tail node
            Node<E> q = p.next;
            // If q is null, it means p is the tail node
            if (q == null) {
                // Set the enqueued node as the next node of the current tail node
                if (p.casNext(null, newNode)) {
                    // Check if the distance between the tail node and p node is two nodes
                    if (p != t) // hop two nodes at a time
                        // If tail is not the tail node, set the enqueued node as the tail node.
                        // If it fails, it means another thread has already moved the tail.
                        casTail(t, newNode);  // Failure is OK.
                    return true;
                }
            }
            // If p node equals the next node of p node, it means both p and q are null, indicating that the queue has just been initialized, so return  
            else if (p == q)
                p = (t != (t = tail)) ? t : head;
            else
                // Check for tail updates after two hops.
                p = (p != t && t != (t = tail)) ? t : q;
        }
    }

Dequeue: First, it gets the head node and checks if the item is null. If it is null, it means that another thread has just performed a dequeue operation, so it updates the head node. If it is not null, it uses CAS to set the head node to null. If the CAS operation succeeds, it directly returns the item. Otherwise, it updates the head node again.

public E poll() {
        // Set the starting point
        restartFromHead:
        for (;;) {
            // p gets the head node
            for (Node<E> h = head, p = h, q;;) {
                // Get the item of the head node
                E item = p.item;
                // If the item of the head node is not null, set the item of the p node to null using CAS
                if (item != null && p.casItem(item, null)) {
                    // Successful CAS is the linearization point
                    // for item to be removed from this queue.
                    if (p != h) // Hop two nodes at a time
                        updateHead(h, ((q = p.next) != null) ? q : p);
                    return item;
                }
                // If the next node of the p node is null, it means the queue is empty, update the head node
                else if ((q = p.next) == null) {
                    updateHead(h, p);
                    return null;
                }
                // If the node dequeue fails, go back to restartFromHead for dequeue again
                else if (p == q)
                    continue restartFromHead;
                else
                    p = q;
            }
        }
    }

ConcurrentLinkedQueue is implemented based on the CAS optimistic lock and has better performance in concurrent scenarios compared to other blocking queues. Therefore, it is suitable for use as a queuing queue in high-concurrency scenarios.