17 Use of Concurrent Containers to Distinguish the Most Optimal Containers in Different Scenarios

17 Use of concurrent containers to distinguish the most optimal containers in different scenarios #

Hello, I’m Liu Chao.

In concurrent programming, we often use containers. Today, I want to discuss with you the topic of how to choose the optimal container for different scenarios.

Map Container in Concurrent Scenarios #

Let’s assume we are designing a simple function to track the top 10 sales of products in an e-commerce system. In general, we use a hash table to store key-value pairs of products and their sales, and then sort them to get the top 10. Here, the hash table is the key to implementing this function. So, think about which container you would use to design this function.

In Lecture 07, I explained in detail the implementation principle of HashMap and various optimization details of its structure. I mentioned that HashMap has superior performance and is often used for storing key-value pairs. So, can we use HashMap here?

The answer is no, we should avoid using HashMap in concurrent scenarios. Because before JDK 1.7, using HashMap in concurrent scenarios would result in an infinite loop, leading to high CPU usage. Hash table resizing is the main cause of this infinite loop. Although Java fixed the infinite loop issue caused by HashMap resizing in JDK 1.8, in high-concurrency scenarios, data loss and inaccurate results can still occur.

To ensure the thread safety of containers, Java provides Map containers such as Hashtable, ConcurrentHashMap, and ConcurrentSkipListMap.

Hashtable and ConcurrentHashMap are implemented based on HashMap and have advantages in storing and accessing small amounts of data.

ConcurrentSkipListMap is designed based on the principles of TreeMap, but with a slight difference. While TreeMap is based on a red-black tree, ConcurrentSkipListMap is based on a skip list. The average time complexity of accessing data in ConcurrentSkipListMap is O(log(n)), making it suitable for scenarios with large amounts of data storage and access. It is commonly used for large-scale caches implemented based on skip lists.

Let’s go back to the example we started with. If the total number of products in this e-commerce system is not particularly large, we can use Hashtable or ConcurrentHashMap to implement the functionality of a hash table.

Hashtable 🆚 ConcurrentHashMap #

To be more precise, let’s further compare these two containers.

In scenarios where data is continuously written and deleted, and there is no data accumulation or data sorting, we can choose Hashtable or ConcurrentHashMap.

Hashtable uses the Synchronized keyword to synchronize methods like put, get, and remove, so in high-concurrency scenarios, there is a lot of lock contention in both read and write operations, resulting in performance overhead for the system.

Compared to Hashtable, ConcurrentHashMap provides better concurrent performance while ensuring thread safety. In JDK 1.7, ConcurrentHashMap used segmented locks, called Segments, to reduce the granularity of locks and optimize lock concurrency.

In JDK 1.8, ConcurrentHashMap underwent major changes and abandoned the concept of Segments. Since the performance of Synchronized locks has greatly improved after Java 6, JDK 1.8 re-introduced Synchronized locks and used them at the granularity of HashEntry. This change made the data structure simpler and the operations more clear and smooth.

Similar to the put method in JDK 1.7, in JDK 1.8, when adding elements, if there are no hash conflicts, CAS (Compare and Swap) is used for adding elements. If there is a conflict, the linked list is locked with Synchronized, and then the subsequent operations are performed.

To sum up, when designing the top 10 sales functionality, we should first choose ConcurrentHashMap.

But one thing to note is that although the overall performance of ConcurrentHashMap is better than Hashtable, ConcurrentHashMap still cannot replace Hashtable in certain scenarios. For example, ConcurrentHashMap is not suitable for strongly consistent scenarios because methods like get and size in ConcurrentHashMap do not use locks. ConcurrentHashMap provides weak consistency, which means that there may be a delay in reading the written data in some cases.

ConcurrentHashMap 🆚 ConcurrentSkipListMap #

Let’s consider another example. In the operating system of my previous company, there was a feature to remind users when their mobile data was running low. The main process involved synchronizing real-time data usage with the virtual network operator on the server side, and then triggering periodic queries on the mobile side. If the data usage was low, a system notification would be displayed. The feature of this capability is that it has a large user base, high concurrency, and more write operations than query operations. In this case, we need to design a cache to store the user and corresponding flow key-value pair information. So, if you were asked to implement a simple cache, how would you design it?

You might consider using the ConcurrentHashMap container, but as I mentioned in Lecture 07, when the data volume is relatively large, the linked list in the container will be converted into a red-black tree. In a concurrent situation, there is a balancing process during the deletion and insertion in a red-black tree, which involves a large number of nodes. Therefore, the cost of competing for lock resources is relatively high.

On the other hand, skip lists operate on a local level and require locking fewer nodes, so their performance in concurrent scenarios tends to be better. You might ask, why didn’t I see SkipListMap, which is based on skip lists, in the non-thread-safe Map container? This is because in a non-thread-safe Map container, TreeMap, which is implemented based on red-black trees, has performance similar to skip lists in single-threaded scenarios.

Therefore, in a non-thread-safe Map container, we can use TreeMap to store and access large amounts of data, while in a thread-safe Map container, we can use ConcurrentSkipListMap to store and access large amounts of data.

So, how does ConcurrentSkipListMap use skip lists to improve the performance of storing and accessing large amounts of data in a container? Let’s first understand the implementation principle of skip lists.

What is a Skip List

A skip list is a special kind of linked list that extends the implementation based on linked lists to have both horizontal and vertical levels, similar to a tree.

A skip list consists of multiple levels of linked lists, each level of which implements a sorted linked list index. Only the bottom level contains all the data. Each level has a pointer pointing to the element with the same value in the upper level. The data decreases from each level to the top level, so only partial data is retained at the top level.

The structure of the skip list utilizes the space-for-time method to improve query efficiency. The program always starts querying and accessing from the top level, narrowing down the query range by comparing the element values. We can understand the specific implementation principle of the skip list through the following diagrams.

First, let’s start with an initialized skip list:

When querying a node with a key value of 9, the query path is as follows:

When adding a node with a key value of 8, first add a node to the linked list at the bottom level, calculate the level value based on probability, create an index level based on the level value, and finally link the new node to the index level. The addition of the node and the linking of the index level are both implemented based on the CAS (Compare and Swap) operation.

When deleting a node with a key value of 7, first find the node to be deleted, set its value to null, then add a marker node to the next position of the node to be deleted to reduce concurrency conflicts. Next, make the predecessor node of the node to be deleted directly point to the successor node, skipping the node to be deleted. The node to be deleted will eventually be handled by the JVM garbage collection. Finally, check if any index level has no other nodes after this deletion and delete the index level if necessary.

Through these two cases, I believe you should have a clear understanding of the applicable scenarios of Hashtable, ConcurrentHashMap, and ConcurrentSkipListMap.

If strong consistency of data is required, Hashtable should be used. In most cases, where weak consistency is sufficient, ConcurrentHashMap can be used. If the data volume is in the millions and there are a large number of insert, delete, and update operations, ConcurrentSkipListMap can be considered.

List Container in Concurrent Scenarios #

Now let’s take a look at a real-life case in a production environment. In most Internet products, a blacklist is usually set. For example, in an e-commerce system, users who frequently participate in flash sales but abandon payment may be put into a blacklist. So, which container would you use in this situation?

First of all, the data volume of the blacklist is not very large, but it needs to be queried quickly when making a purchase to determine whether a user is on the blacklist. Secondly, the user ID is of type integer, so we can consider using an array to store the data. Would ArrayList be the first container that comes to your mind?

I mentioned that ArrayList is not thread-safe and may cause thread safety issues in concurrent scenarios. In this case, we can consider using thread-safe arrays provided by Java for concurrent programming, including Vector and CopyOnWriteArrayList.

Vector is also thread-safe and implemented based on Synchronized locks. The Synchronized keyword is used to modify almost all the methods exposed externally. Therefore, in scenarios where the number of reads is much larger than writes, Vector will have a lot of lock contention, resulting in performance overhead for the system.

In contrast, CopyOnWriteArrayList is a method provided by the java.util.concurrent package. It implements lock-free read operations and write operations by operating on a new copy of the underlying array. It is a read-write separation concurrent strategy. We can understand the specific implementation principle of CopyOnWriteArrayList through the following diagram.

In the mentioned case, we know that reading from the blacklist is much more frequent than writing to it. We can update the list at a time when the business is relatively idle.

Real-time retrieval of written data is not required in this scenario. Therefore, we only need to ensure that we eventually get the user ID written into the array. CopyOnWriteArrayList, as a concurrent array container, is undoubtedly the most suitable container for this type of scenario.

Summary #

In concurrent programming, we often use containers to store data or objects. During the long development process of Java from JDK 1.1 to JDK 1.8, various containers of the same type have been implemented to adapt to different scenarios. I have summarized a table for you today, hoping it will be helpful to you. Feel free to leave a comment to add more information.

Thought Questions #

In rush purchase systems, we often use queues to implement the waiting process for purchases. If you were to choose or design a queue for this purpose, what would you consider?