18 Under What Circumstances Does a Java Program Generate Deadlock and How Do You Locate and Fix It

在Java程序中,死锁是一种常见的多线程问题。它发生在两个或多个线程相互持有对方需要的资源,然后导致程序无法继续执行的情况下。

通常情况下,死锁发生的条件是:

  1. 互斥条件:至少有一个资源不能被共享,一次只能由一个线程使用。
  2. 请求和保持条件:线程已经持有了至少一个资源,并且在等待其他线程释放它所需要的资源。
  3. 不可剥夺条件:线程已经获得的资源在未使用完之前不能被其他线程强行剥夺。
  4. 循环等待条件:存在一个线程链,每个线程都在等待下一个线程所持有的资源。

为了定位和修复死锁问题,你可以进行以下步骤:

  1. 使用工具进行检测:可以使用一些工具,如jstackjconsoleVisualVM,来监测程序中的线程状态,查看是否存在死锁情况。
  2. 分析堆栈信息:当你发现死锁时,分析线程堆栈信息以确定在哪些代码部分发生了死锁。其中,通常会有一些常见的锁定资源和等待资源的代码段。
  3. 避免循环等待:你可以通过对锁定资源的申请顺序进行排序,以避免循环等待的情况发生。
  4. 使用超时等待:在某些情况下,你可以使用tryLock()方法来尝试获取锁,并在一定时间内等待,若未成功获取到锁,可以进行其他操作,以避免无尽的等待。
  5. 调整锁的粒度:可以尝试减少锁的粒度或使用更细粒度的锁,以减少死锁的可能性。
  6. 使用并发工具:Java提供了一些并发工具,如LockSemaphore,它们可以更好地控制资源的访问,减少死锁的风险。

总之,定位和修复Java程序中的死锁问题需要仔细分析线程堆栈信息,并采取适当的措施来避免死锁的发生。通过理解死锁的条件和采取预防措施,你可以有效地应对这个问题。

Typical Answer #

A deadlock is a specific program state in which entities are stuck waiting for each other due to circular dependencies, making it impossible for any entity to move forward. Deadlocks can occur not only between threads but also between processes that have exclusive access to resources. Typically, we focus on deadlocks in multi-threading scenarios, where two or more threads are permanently blocked because they mutually hold the locks they need.

You can understand the basic deadlock problem using the example diagram below:

The most common way to locate a deadlock is to use tools like jstack to obtain thread stacks and identify the dependencies between them, thereby finding the deadlock. If the deadlock is obvious, tools like jstack can directly locate it, and graphical interfaces like JConsole can even be used for limited deadlock detection.

If a deadlock occurs during program execution, in most cases, it cannot be resolved online and can only be resolved by restarting the program or fixing the program itself. Therefore, code review during the development phase or using tools for proactive investigation are often important.

Analysis of Key Points #

Today’s problem is more practical and most deadlocks are not difficult to locate. By grasping the basic ideas and tools usage, understanding the basic concepts related to threads, such as various thread states and synchronization, locks, latches, and other concurrency tools, it is already sufficient to solve most problems.

Regarding deadlocks, interviewers can delve into:

  • Apart from the literal concepts, ask the interviewee to write a program that may deadlock, while also testing their basic thread programming skills.
  • Ask about the tools for diagnosing deadlocks, and if it is a distributed environment, they may be more concerned about whether it can be implemented using APIs.
  • Diagnosing deadlocks in the later stage can be quite painful and often requires overtime work. How can one avoid some typical deadlock scenarios as much as possible in programming? Are there other tools to assist with this?

Knowledge Extension #

Before starting the analysis, let’s take a basic deadlock program as an example. In this example, I only use two nested synchronizations to acquire locks. Here is the code:

public class DeadLockSample extends Thread {
  private String first;
  private String second;
  public DeadLockSample(String name, String first, String second) {
      super(name);
      this.first = first;
      this.second = second;
  }

  public void run() {
      synchronized (first) {
          System.out.println(this.getName() + " obtained: " + first);
          try {
              Thread.sleep(1000L);
              synchronized (second) {
                  System.out.println(this.getName() + " obtained: " + second);
              }
          } catch (InterruptedException e) {
              // Do nothing
          }
      }
  }
  
  public static void main(String[] args) throws InterruptedException {
      String lockA = "lockA";
      String lockB = "lockB";
      DeadLockSample t1 = new DeadLockSample("Thread1", lockA, lockB);
      DeadLockSample t2 = new DeadLockSample("Thread2", lockB, lockA);
      t1.start();
      t2.start();
      t1.join();
      t2.join();
  }
}

After compiling and executing this program, it almost always reproduces a deadlock. Please see the following excerpt of the output. Also, there is an interesting point here. Why does Thread2 get printed first even though I called t1.start() first? This is because thread scheduling depends on the (operating system) scheduler. Although you can influence it with priorities, the specific order is uncertain.

Let’s now simulate problem localization. I’m choosing the most common tool, jstack, but you can find other graphical tools like JConsole.

First, you can use jps or system commands like ps or Task Manager to determine the process ID.

Next, call jstack to get the thread stack:

${JAVA_HOME}\bin\jstack your_pid

Then, analyze the output. Here is a specific excerpt:

Finally, analyze the thread stack information in combination with the code. The output above is very obvious. Find the thread in the BLOCKED state and look for the lock ID (the numbers I marked with the same color) it’s trying to obtain (waiting for). You can quickly locate the problem. jstack itself also extracts simple deadlocks and prints them directly.

In actual applications, class deadlock situations may not have such clear output. However, in general, you can understand it as follows:

Differentiate thread state -> View the target it’s waiting for -> Compare the holding status of the monitor

Understanding the basic thread states and concurrent-related elements is the key to problem localization. Then, combined with the program’s call stack structure, you can locate the specific problematic code.

If we are developing our own management tool and need a more programmatic way to scan service processes and locate deadlocks, we can consider using the standard management API provided by Java, ThreadMXBean. It provides a findDeadlockedThreads() method specifically for this purpose. To illustrate, I modified the DeadLockSample code as follows:

public static void main(String[] args) throws InterruptedException {

  ThreadMXBean mbean = ManagementFactory.getThreadMXBean();
  Runnable dlCheck = new Runnable() {

      @Override
      public void run() {
          long[] threadIds = mbean.findDeadlockedThreads();
          if (threadIds != null) {
                     ThreadInfo[] threadInfos = mbean.getThreadInfo(threadIds);
                     System.out.println("Detected deadlock threads:");
              for (ThreadInfo threadInfo : threadInfos) {
                  System.out.println(threadInfo.getThreadName());
              }
          }
       }
    };

       ScheduledExecutorService scheduler =Executors.newScheduledThreadPool(1);
       // Wait for 5 seconds, then perform deadlock scanning every 10 seconds
        scheduler.scheduleAtFixedRate(dlCheck, 5L, 10L, TimeUnit.SECONDS);
// Deadlock sample code...
}
Recompile and execute, and you will see the output indicating the location of the deadlock. In practical applications, further information can be collected based on this and subsequent processing such as warning can be done. However, it should be noted that taking a snapshot of threads itself is a relatively heavyweight operation, so you should carefully select the frequency and timing.

**How to Prevent Deadlocks as Much as Possible in Programming?**

Firstly, let's summarize the basic elements of deadlock in the previous examples. Basically, deadlock occurs because of:

- Mutual exclusion, similar to the exclusive use of monitors in Java, either it is used by me or you.
- Mutual exclusion is held for a long time and cannot be released by itself before use ends, nor can it be preempted by other threads.
- Circular dependency, a chain of locks occurs between two or more entities.

Therefore, based on this, we can analyze possible strategies and methods for avoiding deadlocks.

**The First Method**

If possible, try to avoid using multiple locks and hold the lock only when necessary. Otherwise, even engineers who are very proficient in concurrent programming will inevitably fall into trouble. Nested synchronized or lock statements are prone to problems.

I will give an [example](https://bugs.openjdk.java.net/browse/JDK-8198928). The Java NIO implementation code is known for its many locks for two reasons. First, its model itself is very complex and to some extent it has to be like this. Second, when designing, it needs to support both blocking and non-blocking modes. The direct result is that some basic operations such as connect require the use of more than three locks. In a recent JDK improvement, a deadlock occurred.

I have simplified it into the following pseudo code. The problem is exposed in the HTTP/2 client, which is a very modern reactive-style API that I highly recommend learning and using.

    /// Thread HttpClient-6-SelectorManager:
    readLock.lock();
    writeLock.lock();
    // Holding readLock/writeLock, calling close() requires obtaining closeLock
    close();
    // Thread HttpClient-6-Worker-2 holds closeLock
    implCloseSelectableChannel(); // Wants to obtain readLock

When close() occurs, the HttpClient-6-SelectorManager thread holds the readLock/writeLock and attempts to acquire the closeLock. At the same time, another HttpClient-6-Worker-2 thread holds the closeLock and tries to acquire the readLock, inevitably leading to deadlock.

The confusing part here is that the holding state of closeLock (the part I marked as green) **is not shown in the thread stack**, please refer to the part marked in the following picture.

![](../images/b7961a84838b5429a8f59826b91ed724-20221031210709-qc1hdu2.png)

To be more specific, please check line 663 of [SocketChannelImpl](http://hg.openjdk.java.net/jdk/jdk/file/ce06058197a4/src/java.base/share/classes/sun/nio/ch/SocketChannelImpl.java) and compare the implementation of the `implCloseSelectableChannel()` method with line 109 of [AbstractInterruptibleChannel.close()](http://hg.openjdk.java.net/jdk/jdk/file/ce06058197a4/src/java.base/share/classes/java/nio/channels/spi/AbstractInterruptibleChannel.java). I will not display the code here.

Therefore, from the perspective of program design, if we give too many responsibilities to a program segment and encounter situations like "needing both... and...", we may need to re-examine the design ideas or purposes. For libraries, because of their basic and shared nature, they are often more frustrating than application development and require careful balance between them.

**The Second Method**

Please let me know if you need further assistance. If multiple locks must be used, it is advisable to design the order of acquiring the locks. This may sound simple, but it is not easy to implement. You can refer to the well-known Banker’s algorithm for guidance.

In general, I recommend using some simple auxiliary methods, such as:

  • Representing the relationship between objects (methods) and locks in a graphical way. For example, let’s take the deadlock discussed earlier. Since it involves calling the same thread, it is simpler.

image

  • Then, compare and combine the relationships between object combinations and calls and consider the possible sequence of calls.

image

  • Merge according to the possible sequence and identify possible deadlock scenarios.

image

The third method

Use methods with timeouts to bring more controllability to the program.

For example, Object.wait(…) or CountDownLatch.await(…) both support timed_wait. We can’t assume that the lock will always be acquired. Instead, we specify a timeout and prepare an exit logic when the lock cannot be obtained.

Concurrent Lock implementations, such as ReentrantLock, also support non-blocking lock acquisition using tryLock(). This is a barging behavior that does not care about the fairness of waiting. If the object happens to not be exclusively locked when executed, the lock will be directly acquired. Sometimes, we want to attempt barging if the condition allows, otherwise, we follow the existing fairness rules and wait. The following method is commonly used:

if (lock.tryLock() || lock.tryLock(timeout, unit)) {
    // ...
}

The fourth method

There have been attempts in the industry to use static code analysis tools (such as FindBugs) to identify fixed patterns and then locate potential deadlocks or race conditions. Practice has proven that this method is effective to some extent. Please refer to the related documentation for details.

In addition to the typical deadlock scenarios in application, there are also more troublesome deadlocks, such as deadlocks that occur during class loading, especially when frameworks heavily use custom class loaders. These deadlocks may not be in the application’s own codebase, and tools like jstack may not display all lock information, making them more tricky to handle. The official Java documentation provides detailed explanations and specific JVM parameters and basic principles for handling such situations.

Today, starting from example programs, I have introduced the causes of deadlock and familiarized you with the use of basic deadlock troubleshooting tools. Finally, I discussed deadlock analysis methods and preventive measures in practical scenarios. I hope this helps you.

Practice #

Have you grasped the essence of the topic we discussed today? Today’s thought problem is that sometimes deadlock is not caused by blocking, but rather a thread entering an infinite loop, causing other threads to wait indefinitely. How can we diagnose such a problem?

Please write your thoughts on this problem in the comments section. I will select thoughtful comments and give you a learning reward voucher. Feel free to discuss with me.

Is your friend also preparing for an interview? You can “invite a friend to read” and share today’s topic with them. Maybe you can help them.