40 How to Resolve the Poor Performance of Atomic Integer in High Concurrency

40 How to Resolve the Poor Performance of AtomicInteger in High Concurrency #

In this lesson, we will mainly discuss why AtomicInteger performs poorly in high concurrency and how to solve it.

We know that JDK 1.5 introduced the atomic classes AtomicInteger and AtomicLong for use in concurrent scenarios.

In a concurrent scenario, if we need to implement a counter, we can use AtomicInteger and AtomicLong. This way, we can avoid locking and complex code logic. With these atomic classes, we only need to execute the corresponding encapsulated methods, such as performing atomic increments or atomic decrements on these variables, to meet the requirements of most business scenarios.

However, although they are very useful, if your business scenario involves a large amount of concurrency, you will also find that these two atomic classes actually have significant performance issues. Why is this the case? Let’s start by looking at an example.

Issues with AtomicLong #

First, let’s take a look at some code:

/**
* Description: Using AtomicInteger with 16 threads
*/

public class AtomicLongDemo {

   public static void main(String[] args) throws InterruptedException {

       AtomicLong counter = new AtomicLong(0);

       ExecutorService service = Executors.newFixedThreadPool(16);

       for (int i = 0; i < 100; i++) {
           service.submit(new Task(counter));
       }

       Thread.sleep(2000);

       System.out.println(counter.get());
   }

   static class Task implements Runnable {

       private final AtomicLong counter;

       public Task(AtomicLong counter) {
           this.counter = counter;
       }

       @Override
       public void run() {
           counter.incrementAndGet();
       }
   }
}

In this code, we create an AtomicLong with an initial value of 0. Then, we create a thread pool with 16 threads and add the same task to this thread pool 100 times.

Now let’s take a closer look at what the task does. In the Task class below, you can see that the task simply calls the incrementAndGet method of the AtomicLong, which is equivalent to an increment operation. As a result, the entire purpose of this class is to start with this atomic class at 0, add 100 tasks, and increment each task by one.

The expected result of running this code is undoubtedly 100. Although it involves multi-threading and concurrent access, AtomicLong can still guarantee the atomicity of the incrementAndGet operation, so there won’t be any thread safety issues.

However, if we look deeper into the internal situation, you may be surprised. Let’s simplify the model to a concurrent scenario where only two threads are working at the same time, because two threads and more threads are essentially the same. As shown in the image below:

In this image, each thread runs in its own core and has its own local memory. Below the local memory, there is shared memory shared by two CPU cores.

For the value property inside the AtomicLong, which is the property that holds the current value of the AtomicLong, it is declared as volatile, so it needs to ensure its own visibility.

In this way, every time its value changes, it needs to perform flush and refresh operations. For example, if the value of “ctr” is initially 0, as shown in the image, once core 1 changes it to 1, it will first flush this latest result to the shared memory below on the left, and then refresh it to the local memory of core 2 on the right. In this way, core 2 can perceive this change.

Due to intense competition, these flush and refresh operations consume a lot of resources, and CAS often fails.

Improvements and principles brought by LongAdder #

In JDK 8, a new class called LongAdder was introduced, which is a utility class for operating on Long types. Since we already have AtomicLong, why do we need to introduce LongAdder?

Let’s use an example to illustrate. The following example is very similar to the previous one. The only difference is that we changed the utility class from AtomicLong to LongAdder. The only difference in logic is that when printing the final result, we changed the method call from get to sum. Everything else remains the same.

Let’s take a look at the code example using LongAdder:

/**
* Description: Using LongAdder with 16 threads
*/

public class LongAdderDemo {

    public static void main(String[] args) throws InterruptedException {

        LongAdder counter = new LongAdder();

        ExecutorService service = Executors.newFixedThreadPool(16);

        for (int i = 0; i < 100; i++) {

            service.submit(new Task(counter));

        }

        Thread.sleep(2000);

        System.out.println(counter.sum());

    }

    static class Task implements Runnable {

        private final LongAdder counter;

        public Task(LongAdder counter) {

            this.counter = counter;

        }

        @Override

        public void run() {

            counter.increment();

        }

    }

}


The code produces the same output of 100, but it runs faster than the AtomicLong implementation we just saw. Now let's explain why LongAdder is more efficient than AtomicLong in highly concurrent scenarios.

LongAdder introduces the concept of segmented accumulation. There are two parameters involved in the internal counting process: the base variable and an array of Cells.

The base variable is used when there is no intense contention and the cumulative result can be directly updated on the base variable.

However, when there is intense contention, the Cell[] array comes into play. In such cases, each thread will accumulate its count in the Cell object corresponding to its thread, instead of sharing a single counter.

This way, LongAdder assigns different threads to different Cells to minimize conflicts. It operates on a segmented approach, similar to the idea of 16 Segment in ConcurrentHashMap in Java 7.

In scenarios with intense contention, LongAdder assigns each thread to a specific Cell based on its hash value. Each Cell acts as an independent counter, avoiding interference with other counters. As a result, the flush and refresh operations are greatly reduced, and the probability of conflicts is lowered. This explains why LongAdder has a higher throughput than AtomicLong. In essence, it trades space for time because it has multiple counters working simultaneously, which requires larger memory consumption.

So how does LongAdder actually count in a multithreaded environment? The answer lies in the final step of the sum method. When LongAdder.sum() is executed, it accumulates the Cell values from all threads and adds the base value to obtain the final sum. Here is the code:

```java
public long sum() {

   Cell[] as = cells; Cell a;

   long sum = base;

   if (as != null) {

       for (int i = 0; i < as.length; ++i) {

           if ((a = as[i]) != null)

               sum += a.value;

       }

   }

   return sum;

}

In this sum method, the logic is very clear. It first retrieves the value of the base variable and then iterates over all Cells, adding up the value of each Cell to obtain the final sum. Because no locking is performed during the calculation, the sum obtained may not be completely accurate, as the Cell values may change during the calculation.

Now that we understand why AtomicLong (or AtomicInteger) performs poorly in highly concurrent scenarios and why LongAdder performs better, let’s analyze how to choose between them.

How to Choose #

In low contention scenarios, AtomicLong and LongAdder have similar characteristics and throughput because there is less competition. However, in highly concurrent scenarios, LongAdder is much faster. Through experiments, it has been found that LongAdder has a throughput approximately ten times greater than AtomicLong. However, efficiency comes at a cost, and LongAdder consumes more memory.

Can AtomicLong Be Replaced by LongAdder? #

Now we need to consider whether AtomicLong can be completely replaced with the more efficient LongAdder. The answer is not always. It depends on the specific use case.

LongAdder only provides simple methods like add and increment, which are suitable for scenarios where we need to compute sums or counts. It is suitable for relatively simple scenarios. On the other hand, AtomicLong has additional advanced methods like compareAndSet, which can handle more complex scenarios that require CAS operations besides addition and subtraction.

Conclusion: If our use case only involves addition and subtraction operations, it is recommended to use the more efficient LongAdder. However, if we need to use CAS operations like compareAndSet, we should use AtomicLong to fulfill those requirements.