10 How to Ensure Collections Are Thread Safe and How Concurrent Hash Map Achieves High Efficiency Thread Safety

在 Java 中保证容器是线程安全的有几种方式。一种方式是使用线程安全的容器类,如 ConcurrentHashMap。另一种方式是使用同步控制机制,如 synchronized 关键字和 Lock

ConcurrentHashMap 是 Java 并发包提供的线程安全的哈希表实现。它使用了一种称为分段锁的机制,将整个哈希表分成多个小的段(Segment),每个段都可以独立地加锁和进行操作。这样,在多线程并发访问时,不同的线程可以同时访问不同的段,提高了并发性能。

相比于传统的线程安全哈希表类(如 Hashtable),ConcurrentHashMap 在性能上有了明显的改进。它减小了锁的粒度,增加了并发访问的能力。而且,ConcurrentHashMap 还提供了更多的方法,比如 putIfAbsent()compute(),可以更好地支持并发编程。

所以,通过使用 ConcurrentHashMap,我们能够保证容器的线程安全性,并且在并发访问时获得更高的性能。

10 How to ensure collections are thread-safe and how ConcurrentHashMap achieves high-efficiency thread safety #

Java provides different levels of support for thread safety. In addition to synchronized containers like Hashtable, there are also so-called synchronized wrappers within the traditional collection framework. We can use the wrapping methods provided by the Collections utility class to obtain a synchronized wrapper container (such as Collections.synchronizedMap). However, these wrappers use a very coarse-grained synchronization approach, which performs poorly in high-concurrency scenarios.

Another more common choice is to use the thread-safe container classes provided by the concurrency package. It provides:

  • Various concurrent containers, such as ConcurrentHashMap, CopyOnWriteArrayList.
  • Various thread-safe queues (Queue/Deque), such as ArrayBlockingQueue, SynchronousQueue.
  • Thread-safe versions of various ordered containers, and so on.

The specific ways to ensure thread safety include simple synchronization methods, as well as more fine-grained implementations such as ConcurrentHashMap based on separate locks. The specific choice depends on the requirements of the development scenario. Generally speaking, the container implementations provided in the concurrency package are far superior to the earlier simple synchronization implementations in terms of general usage scenarios.

Topic Analysis #

When it comes to thread safety and concurrency, it can be said that they are essential topics in Java interviews. The answer I provided above is a relatively broad summary, and the implementations of concurrent containers like ConcurrentHashMap are continuously evolving, so we cannot generalize.

To deeply analyze and answer this question and its extension, at least the following are required:

  • Understand the basic thread safety tools.
  • Understand the problems with Map in traditional collection frameworks for concurrent programming and the limitations of simple synchronization approaches.
  • Understand the methods adopted by the concurrency package, especially ConcurrentHashMap, to improve concurrent performance.
  • It is best to understand the evolution of ConcurrentHashMap itself, as many analysis materials are still based on its early versions.

Today, I will mainly build on the content of the previous two lectures in the column and focus on interpreting HashMap and ConcurrentHashMap, which are often simultaneously evaluated. Today’s lecture is not a comprehensive review of concurrency; after all, it cannot be covered in just one lecture of the column. This can be considered an appetizer. More low-level mechanisms, such as CAS, will be systematically introduced in the concurrency topic in the advanced module of Java.

Knowledge Expansion #

  1. Why do we need ConcurrentHashMap?

Hashtable itself is relatively inefficient because its implementation essentially adds “synchronized” to methods like put, get, and size. In simple terms, this means that all concurrent operations have to compete for the same lock. When one thread is performing a synchronized operation, other threads have to wait, greatly reducing the efficiency of concurrent operations.

As mentioned earlier, HashMap is not thread-safe, and concurrent situations can result in problems such as 100% CPU usage. Can we solve this problem using the synchronized wrappers provided by Collections?

Looking at the code snippet below, we can see that the synchronized wrapper simply constructs another synchronized version using the input Map. Although all operations are no longer declared as synchronized methods, they still use “this” as the mutex for mutual exclusion, without any real improvement!

private static class SynchronizedMap<K,V>
    implements Map<K,V>, Serializable {
    private final Map<K,V> m;     // Backing Map
    final Object      mutex;        // Object on which to synchronize
    // ...
    public int size() {
        synchronized (mutex) {return m.size();}
    }
 // ...
}

Therefore, Hashtable or synchronized wrapper versions are only suitable for non-highly concurrent scenarios.

  1. Analysis of ConcurrentHashMap

Let’s take a look at how ConcurrentHashMap is designed and implemented, and why it can greatly improve concurrency efficiency.

First of all, I want to emphasize that the design and implementation of ConcurrentHashMap have been evolving. For example, there were significant changes in Java 8 (and also in Java 7). Therefore, in this analysis, I will compare and analyze the structure, implementation mechanisms, and other aspects, highlighting the main differences between different versions.

Early versions of ConcurrentHashMap were implemented based on:

  • Separate locks, which means dividing it internally into segments (Segments), with an array of HashEntries inside. Similar to HashMap, entries with the same hash value are also stored in linked list form.
  • HashEntry uses the volatile value field to ensure visibility, and also utilizes the mechanism of immutable objects to optimize performance using the low-level capabilities provided by Unsafe, such as volatile access. After all, many of the operations in Unsafe are JVM intrinsic optimizations.

You can refer to the following schematic diagram of the internal structure of the early version of ConcurrentHashMap. The core is the use of segmented design. When performing concurrent operations, only the corresponding segment needs to be locked. This effectively avoids the synchronization problem of Hashtable and greatly improves performance.

During construction, the number of segments is determined by the so-called concurrencyLevel, which is usually 16 by default and can also be directly specified in the corresponding constructor. Note that in Java, it needs to be a power of 2. If the input is, for example, 15, which is not a power of 2, it will be automatically adjusted to the nearest power of 2, such as 16.

To understand the specific situation, let’s take a look at some basic operations of Map in the source code, such as the get method, which is relatively new in JDK 7. For the sake of understanding, I have directly annotated the optimization parts in the code segment below. The get operation needs to ensure visibility, so there is no synchronization logic.

public V get(Object key) {
        Segment<K,V> s; // manually integrate access methods to reduce overhead
        HashEntry<K,V>[] tab;
        int h = hash(key.hashCode());
       // Use bitwise operations to replace ordinary mathematical operations
       long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
        // Locating by segment
        // Use Unsafe for volatile access
        if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) != null &&
            (tab = s.table) != null) {
           // omitted
          }
        return null;
    }

And for the put operation, it first avoids hash conflicts by using the double hashing method. Then, through the Unsafe calling method, it directly obtains the corresponding Segment and performs a thread-safe put operation:

public V put(K key, V value) {
    Segment<K,V> s;
    if (value == null)
        throw new NullPointerException();
    // Double hashing to ensure data scattering and avoid hash conflicts
    int hash = hash(key.hashCode());
    int j = (hash >>> segmentShift) & segmentMask;
    if ((s = (Segment<K,V>)UNSAFE.getObject          // nonvolatile; recheck
         (segments, (j << SSHIFT) + SBASE)) == null) //  in ensureSegment
        s = ensureSegment(j);
    return s.put(key, hash, value, false);
}

The core logic is implemented in the following internal method:

final V put(K key, int hash, V value, boolean onlyIfAbsent) {
    // scanAndLockForPut finds whether there is a Node with the same key
    // Ensures acquiring the lock in any case
    HashEntry<K,V> node = tryLock() ? null :
        scanAndLockForPut(key, hash, value);
    V oldValue;
    try {
        HashEntry<K,V>[] tab = table;
        int index = (tab.length - 1) & hash;
        HashEntry<K,V> first = entryAt(tab, index);
        for (HashEntry<K,V> e = first;;) {
            if (e != null) {
                K k;
                // Updates the existing value...
            }
            else {
                // Places the HashEntry at a specific position, and if it exceeds the threshold, performs rehashing
                // ...
            }
        }
    } finally {
        unlock();
    }
    return oldValue;
}

Therefore, from the above source code, it is clear that during concurrent write operations:

  • ConcurrentHashMap acquires a reentrant lock to ensure data consistency. A Segment itself is an extension implementation based on ReentrantLock. Therefore, during concurrent modifications, the corresponding Segment is locked.
  • In the initial stage, repetitive scanning is performed to determine whether the corresponding key value is already in the array, and then decide whether to perform an update or place operation. You can see the corresponding comments in the code. Repetitive scanning and conflict detection are common techniques of ConcurrentHashMap.
  • In one of my columns introducing HashMap, I mentioned the possibility of expansion. It also exists in ConcurrentHashMap. However, there is an obvious difference - it does not perform overall expansion, but individual expansion on Segment. The details will not be covered here.

Another method to pay attention to in the Map is the size() method, and its implementation involves a side effect of partitioned locks.

Imagine that if there is no synchronization, calculating the total value of all Segments simple standalone may lead to inaccurate results due to concurrent put operations. However, directly locking all Segments for calculation can be very expensive. In fact, partitioned locks also limit the initialization of Map.

Therefore, the implementation of ConcurrentHashMap is to attempt to obtain reliable values through a retry mechanism (RETRIES_BEFORE_LOCK, specifying the retry count as 2). If no changes are detected (by comparing Segment.modCount), it directly returns. Otherwise, it acquires the lock for operations.

Now let’s compare the changes in ConcurrentHashMap in Java 8 and later versions:

  • In terms of overall structure, its internal storage becomes very similar to the HashMap structure described in my previous column. It also consists of a large bucket array, and an internal structure called a linked list (bin). The synchronization granularity is finer.
  • It still has the definition of Segment, but it is only for the sake of serialization compatibility and no longer has any structural purpose.
  • Because Segment is no longer used, the initialization operation is greatly simplified and changed to a lazy-load form, which effectively avoids initial overhead and solves the common complaint in older versions.
  • Use volatile to ensure visibility for data storage.
  • Use operations like CAS for lock-free concurrent operations in specific scenarios.
  • Utilize low-level techniques like Unsafe and LongAdder for extreme optimization.

Let’s first take a look at the current implementation of data storage. We can see that Key is declared as final because the key of an entry cannot change during its lifecycle. At the same time, val is declared as volatile to ensure visibility.

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    volatile V val;
    volatile Node<K,V> next;
    // ... 
}

Now let’s dive into the implementation of the put method, which is responsible for concurrent operations. I won’t go into detail about the get method and constructors, as they are relatively simple. Let’s focus on how concurrent put is achieved.

final V putVal(K key, V value, boolean onlyIfAbsent) {
    if (key == null || value == null)
        throw new NullPointerException();
    int hash = spread(key.hashCode());
    int binCount = 0;
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh; K fk; V fv;
        if (tab == null || (n = tab.length) == 0)
            tab = initTable();
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
            // Perform lock-free thread-safe operation using CAS if bin is empty
            if (casTabAt(tab, i, null, new Node<K,V>(hash, key, value)))
                break; 
        }
        else if ((fh = f.hash) == MOVED)
            tab = helpTransfer(tab, f);
        else if (onlyIfAbsent
                 && fh == hash
                 && ((fk = f.key) == key || (fk != null && key.equals(fk)))
                 && (fv = f.val) != null)
            return fv;
        else {
            V oldVal = null;
            synchronized (f) {
               // Synchronize the modification operation...
            }
            if (binCount != 0) {
                if (binCount >= TREEIFY_THRESHOLD)
                    treeifyBin(tab, i);
                if (oldVal != null)
                    return oldVal;
                break;
            }
        }
    }
    addCount(1L, binCount);
    return null;
}

The initialization operation is implemented in initTable. This is a typical usage of CAS (Compare and Swap) with the volatile sizeCtl being used as a mutual exclusion mechanism. If conflicts are detected, the operation will spin and wait for the conditions to recover. Otherwise, CAS is used to set the exclusive flag. If successful, the initialization logic is executed; otherwise, it retries.

Please refer to the following code:

private final Node<K,V>[] initTable() {
    Node<K,V>[] tab;
    int sc;
    while ((tab = table) == null || tab.length == 0) {
        if ((sc = sizeCtl) < 0)
            Thread.yield();
        else if (U.compareAndSetInt(this, SIZECTL, sc, -1)) {
            try {
                if ((tab = table) == null || tab.length == 0) {
                    int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
                    @SuppressWarnings("unchecked")
                    Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
                    table = tab = nt;
                    sc = n - (n >>> 2);
                }
            } finally {
                sizeCtl = sc;
            }
            break;
        }
    }
    return tab;
}

When the bin is empty, it is also unnecessary to acquire a lock. Instead, CAS operations are used to place the node.

Have you noticed that synchronized is used for fine-grained synchronization instead of the commonly recommended ReentrantLock or similar mechanisms? In modern JDKs, synchronized has been continuously optimized, and performance differences are no longer a major concern. Moreover, compared to ReentrantLock, synchronized can reduce memory consumption, which is a significant advantage.

Meanwhile, more implementation details are optimized using Unsafe. For example, tabAt directly uses getObjectAcquire to avoid the overhead of indirect calls.

static final <K,V> Node<K,V> tabAt(Node<K,V>[] tab, int i) {
    return (Node<K,V>)U.getObjectAcquire(tab, ((long)i << ASHIFT) + ABASE);
}

Now let’s take a look at how the size operation is implemented. If you read the code here, you will find that the actual logic is in the sumCount method. So what does sumCount do?

final long sumCount() {
    CounterCell[] as = counterCells;
    CounterCell a;
    long sum = baseCount;
    if (as != null) {
        for (int i = 0; i < as.length; ++i) {
            if ((a = as[i]) != null)
                sum += a.value;
        }
    }
    return sum;
}

Although the approach is still similar to before, dividing and conquering the counting and then summing it up, the implementation relies on a strange CounterCell. Does it provide more accurate results? How is data consistency ensured?

static final class CounterCell {
    volatile long value;
    CounterCell(long x) { value = x; }
}

In fact, the operations on CounterCell are based on java.util.concurrent.atomic.LongAdder, which is a JVM optimization that trades space for efficiency. It utilizes the complex logic inside Striped64. This is a niche feature, and in most cases, it is still recommended to use AtomicLong, which is sufficient to meet the performance requirements of most applications.

Today, I started with thread safety issues, summarized the basic container tools conceptually, analyzed the problems in early synchronized containers, and then delved into the design and implementation of ConcurrentHashMap in Java 7 and Java 8. I hope the concurrency techniques of ConcurrentHashMap will be helpful in your daily development work.

Practice Exercise #

Do you have a clear idea about the topic we discussed today? Here’s a thought-provoking question for you: Are there any typical scenarios in product code that require the use of concurrent containers like ConcurrentHashMap?

Please share your thoughts on this question in the comments. I will select the most thoughtful comment and reward you with a study encouragement fund. I welcome you to discuss it with me.

Are your friends also preparing for interviews? You can “Invite a friend to read” and share today’s question with them. Perhaps you can help them.