29 Why Is Hash Map Not Thread Safe

29 Why Is HashMap Not Thread-Safe #

In this lesson, we will mainly discuss why HashMap is not thread-safe. I believe you are already familiar with HashMap, which is a widely used container in our work and study. It is also one of the main implementation classes of Map. However, it does not have the characteristics of thread safety on its own, which can be demonstrated in various situations. Let’s analyze this in detail.

Source Code Analysis #

First, let’s take a look at the source code of the put method in HashMap:

public V put(K key, V value) {

    if (key == null)
        return putForNullKey(value);

    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);

    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    } 

    // modCount++ is a compound operation
    modCount++;

    addEntry(hash, key, value, i);
    return null;
}

In the put() method of HashMap, we can see that there are many operations inside. Let’s focus on the line of code marked with modCount++. Those with experience might have noticed that this is a typical “i++” operation, which is a case of “result inconsistency” that we discussed in lesson 06. At first glance, i++ is just one line of code, but in fact, it is not an atomic operation. It consists of three steps, and there is a possibility of interruption between each step.

  • The first step is to read.
  • The second step is to increment.
  • The third step is to save.

Next, let’s look at how the thread safety problem occurs.

img

Following the order of the arrows, let’s assume that Thread 1 first obtains the result i=1, then performs the i+1 operation. However, at this point, the result of i+1 is not saved, and Thread 1 is switched out. Now the CPU starts executing Thread 2, which does the same thing as Thread 1, an i++ operation. Now, think about what value of i Thread 2 obtains. In fact, it is the same as the result that Thread 1 obtained, which is 1. Why? Although Thread 1 performed the +1 operation on i, it did not save the result, so Thread 2 cannot see the modified result.

Suppose that after Thread 2 performs the +1 operation on i, it switches back to Thread 1 to complete the unfinished operation, which is to save the result of i + 1 as 2. Then it switches back to Thread 2 to complete the saving operation of i = 2. Although both threads performed the +1 operation on i, the final result is i = 2 instead of the expected i = 3. This is how the thread safety problem occurs, resulting in incorrect data results. This is also the most typical thread safety problem.

Therefore, from the perspective of the source code or theory, this is enough to prove that HashMap is not thread-safe. If multiple threads simultaneously call the put() method, it is very likely to calculate the value of modCount incorrectly (the code analysis above is based on the Java 7 version of the source code, but in the put method of HashMap in Java 8 version, the putVal method is called, and it also contains the ++modCount statement, so the principle is the same).

Experiment: Incorrect values obtained during expansion #

Just now, we analyzed the source code, but you may not be satisfied with it. Now, let’s open the code editor and conduct an experiment to prove that HashMap is not thread-safe. Why is it said that HashMap is not thread-safe? Let’s explain the principle first. The default capacity of HashMap is not very large. If new data is continuously added to the map, it will resize the map at an appropriate time. During the resizing process, it creates a new empty array and fills it with the old entries. In this filling process, if a thread tries to retrieve a value, it may get a null value instead of the value that was originally added. So, in order to demonstrate this scenario, let’s take a look at the following code:

public class HashMapNotSafe {

    public static void main(String[] args) {

        final Map<Integer, String> map = new HashMap<>();

        final Integer targetKey = 0b1111_1111_1111_1111; // 65,535

        final String targetValue = "v";

        map.put(targetKey, targetValue);

        new Thread(() -> {

            IntStream.range(0, targetKey).forEach(key -> map.put(key, "someValue"));

        }).start();

        while (true) {

            if (null == map.get(targetKey)) {

                throw new RuntimeException("HashMap is not thread safe.");

            }

        }

    }

}

In the code, we first create a HashMap and define a key and a value. The key is a binary value 1111_1111_1111_1111, which corresponds to the decimal value 65,535. The value is arbitrary and we have chosen the value “v” to represent it. We put this key-value pair into the map.

Next, we create a new thread that continuously puts new data into our map. Let’s take a look at how it does it. It uses an IntStream with a range from 0 to the targetKey mentioned earlier (65,535). This range is inclusive at the start and exclusive at the end, so it will iterate from 0, 1, 2, 3, and so on, and for each iteration, the value 0, 1, 2, 3, 4 will be put as the key into the map. The value for each key is “someValue” because we are not concerned about the value.

Then, we start this thread and enter a while loop, which is the key part. In this while loop, we continuously check whether the value corresponding to the key we put earlier is still the expected string “v”. We keep getting the value associated with the key from the map in the while loop. If HashMap is thread-safe, then no matter what, the value it retrieves should be the string “v” that we initially put. However, if it retrieves a null value, it will satisfy the if condition and throw a RuntimeException because getting a null value means that the retrieved value is inconsistent with the value we originally put. This demonstrates that it is not thread-safe, so we throw a RuntimeException to indicate this.

Let’s run this program to see if it throws the expected exception. Once the exception is thrown, it means that it is not thread-safe. The result of running this code is:

Exception in thread "main" java.lang.RuntimeException: HashMap is not thread safe.
    at lesson29.HashMapNotSafe.main(HashMapNotSafe.java:25)

It is clear that the program quickly throws the expected RuntimeException, and we describe it as “HashMap is not thread-safe”. Once it enters the if statement, it has already proved that the value it retrieves is null instead of the expected string “v”.

Through this example, we have also proved that HashMap is not thread-safe.

In addition to the example above, there are many other situations where HashMap is not thread-safe, such as:

Simultaneous put collisions causing data loss #

For example, if multiple threads use put to add elements, and two put operations have the same key, a collision occurs. This means that the hash value calculated for both keys points to the same bucket, and both threads simultaneously check that the position is empty and writable. As a result, the two different values of the threads will be added to the same position in the array, and only one data entry will be preserved while the other is lost.

Visibility problems cannot be guaranteed #

Let’s also consider visibility. Visibility is also a part of thread safety. If a data structure claims to be thread-safe, it also needs to ensure visibility, which means that when a thread operates on this container, the operation should be visible to other threads, meaning that other threads can perceive this operation. However, HashMap cannot guarantee this. If thread 1 puts a new value for a key, the visibility of this change is not guaranteed when thread 2 tries to get the corresponding value for the key. In other words, thread 2 may or may not see this change. Therefore, from the perspective of visibility, HashMap is also not thread-safe.

Infinite loop causing 100% CPU usage #

Next, let’s give an example of an infinite loop causing 100% CPU usage. HashMap may enter an infinite loop and cause 100% CPU usage, which mainly occurs during resizing, i.e., when it creates a new HashMap internally. The resizing logic reverses the order of nodes in the hash buckets. When multiple threads are resizing at the same time, due to HashMap not being thread-safe, if two threads reverse the order at the same time, a loop may form, which is a circular linked list loop. This means that node A points to node B, and node B points back to node A. As a result, when trying to retrieve the value corresponding to a key in the next iteration, it will get caught in an endless loop when traversing the linked list, resulting in 100% CPU usage.

In conclusion, HashMap is not thread-safe. In multi-threaded scenarios, if a Map is needed, it is recommended to avoid using the thread-unsafe HashMap. Although Collections.synchronizedMap(new HashMap()) is thread-safe, its performance is poor because it uses a lot of synchronization, and multiple threads cannot operate simultaneously. It is recommended to use the thread-safe and performant ConcurrentHashMap instead. We will introduce ConcurrentHashMap in the next lesson.