33 What Are the Features of Copy on Write Array List

33 What Are the Features of CopyOnWriteArrayList #

In this lesson, we will mainly discuss the characteristics of CopyOnWriteArrayList.

But before we talk about CopyOnWriteArrayList, let’s talk about the story before it was born. Actually, we already have ArrayList and LinkedList as the list implementations using arrays and linked lists, and we also have the thread-safe Vector and Collections.synchronizedList() that can be used. So let’s first take a look at the code for the size and get methods of the thread-safe Vector:

public synchronized int size() {

    return elementCount;

}

public synchronized E get(int index) {

    if (index >= elementCount)

        throw new ArrayIndexOutOfBoundsException(index);

    return elementData(index);

}

As you can see, Vector uses synchronized internally to ensure thread safety, and the locking granularity is relatively large, with locks at the method level. In high-concurrency situations, it is prone to contention and relatively low concurrency efficiency. In this regard, Vector is similar to Hashtable.

Moreover, the previous types of lists do not allow editing during iteration. If element additions or removals are performed during iteration, a ConcurrentModificationException will be thrown, which can be troublesome in many cases.

Therefore, starting from JDK 1.5, the Java concurrency package provides the CopyOnWriteArrayList as the primary concurrent List implementation using the CopyOnWrite mechanism. The CopyOnWrite concurrent collection also includes CopyOnWriteArraySet, which is implemented using CopyOnWriteArrayList. So today, we will use CopyOnWriteArrayList as a breakthrough to explore the characteristics of the CopyOnWrite container.

Applicable Scenarios #

  • Read operations need to be as fast as possible, while slow writes are acceptable

In many application scenarios, there may be far more read operations than write operations. For example, some system-level information is often loaded or modified very few times, but it is frequently accessed by all modules within the system. In such scenarios, we hope that read operations can be as fast as possible, while slow writes are acceptable.

  • More read operations and fewer write operations

The blacklist is the most typical scenario. Suppose we have a search website where users enter keywords in the search box to search for content, but some keywords are not allowed to be searched. These keywords that cannot be searched will be placed in a blacklist, which does not need to be updated in real-time and can be updated once a day. When a user searches, the current keyword will be checked against the blacklist. If it exists, a prompt will be displayed indicating that the keyword cannot be searched. This scenario with more read operations and fewer write operations is also suitable for using CopyOnWrite collections.

Read and Write Rules #

  • Read-write lock rules

The idea of a read-write lock is that reads can be shared, while all other operations are mutually exclusive (write-write, read-write, write-read), because read operations do not modify the existing data, so concurrent reads have no safety issues. On the other hand, write operations are dangerous, so when a write operation occurs, no read operation should be allowed to join, and a second write thread should also not be allowed to join.

  • Upgrade to the rules of read-write lock

The idea of CopyOnWriteArrayList goes even further than read-write locks. In order to maximize the performance of reads, reading in CopyOnWriteArrayList does not require any locks. Even more impressive, writing does not block reading operations, which means you can read while writing, and only synchronization is required between writes, meaning multiple writes are not allowed to occur at the same time, but reading is allowed to occur during writing. This greatly improves the performance of read operations.

Characteristics #

  • Meaning of CopyOnWrite

From the name of CopyOnWriteArrayList, we can see that it is an ArrayList that satisfies the CopyOnWrite property. The meaning of CopyOnWrite is that when the container needs to be modified, it does not directly modify the current container, but instead makes a copy of the current container, then modifies the new container, and after the modification is complete, redirects the reference of the original container to the new container. This completes the entire modification process.

The benefit of doing this is that CopyOnWriteArrayList utilizes the principle of “immutability”. Since a new copy is made every time the container is modified, the old container is actually immutable and thread-safe, so no further synchronization operations are required. We can read the CopyOnWrite container concurrently without locking, because the current container does not add any elements or make modifications.

All modification operations (add, set, etc.) in CopyOnWriteArrayList are implemented by creating a new copy of the underlying array, so CopyOnWrite containers also reflect the idea of read-write separation, with different containers for reading and writing.

  • Allow modification of the collection during iteration

We know that if we modify the content of an ArrayList during iteration, a ConcurrentModificationException will be thrown. Let’s analyze the reason why ArrayList throws this exception.

In the ArrayList source code, the ListItr’s next method has a checkForComodification method, which is as follows:

final void checkForComodification() {

    if (modCount != expectedModCount)

        throw new ConcurrentModificationException();

}

First, it checks whether modCount is equal to expectedModCount. modCount is a counter that keeps track of the number of modifications made to the collection. It increases every time we call methods like add, remove, or trimToSize. expectedModCount is a variable of the iterator that is initialized and recorded with the value of modCount when the iterator is created. If during the iteration, modCount and expectedModCount are not the same, it means that someone has modified the contents of the collection, and an exception is thrown.

Unlike ArrayList, the iterator of CopyOnWriteArrayList does not throw a ConcurrentModificationException if the array content is modified during iteration. This is because the iterator still uses the old array, but the iterated content may be outdated. The demo code below demonstrates this:

/**
 * Describes how CopyOnWriteArrayList allows modification of the collection during iteration
 */
public class CopyOnWriteArrayListDemo {

    public static void main(String[] args) {

        CopyOnWriteArrayList<Integer> list = new CopyOnWriteArrayList<>(new Integer[]{1, 2, 3});

        System.out.println(list); //[1, 2, 3]

        //Get iterator 1
        Iterator<Integer> itr1 = list.iterator();

        //Add one element and verify list is updated
        list.add(4);
        System.out.println(list); //[1, 2, 3, 4]

        //Get iterator 2
        Iterator<Integer> itr2 = list.iterator();

        System.out.println("====Verify Iterator 1 content====");
        itr1.forEachRemaining(System.out::println); //1,2,3

        System.out.println("====Verify Iterator 2 content====");
        itr2.forEachRemaining(System.out::println); //1,2,3,4

    }

}

This code first creates a CopyOnWriteArrayList with an initial value of [1, 2, 3]. The printed result at this point is obviously [1, 2, 3]. Then we create an iterator called itr1 and add a new element after creating it using the list.add() method. This adds the element 4 to the list, so the print result of the list is now [1, 2, 3, 4]. We then create another iterator called itr2 and print the content generated by iterating through both iterators. The output of this code is:

[1, 2, 3]
[1, 2, 3, 4]
====Verify Iterator 1 content====
1
2
3
====Verify Iterator 2 content====
1
2
3
4

As you can see, the content printed by these two iterators is different. The first iterator prints [1, 2, 3], while the second iterator prints [1, 2, 3, 4]. Although they are printed after the fourth element is added, their creation times are different. The first iterator is created when the list has only three elements, so it remains unaware of any modifications to the list that occur later.

This result illustrates that once the iterator of CopyOnWriteArrayList is created, any additions to the original CopyOnWriteArrayList object will not be reflected in the iterator, and no error will be thrown. This is a significant difference from ArrayList.

Disadvantages #

These disadvantages apply not only to CopyOnWriteArrayList, but also to other CopyOnWrite containers:

  • Memory Usage:

Because of the copy-on-write mechanism, writing operations in CopyOnWrite containers require simultaneous memory allocation for two objects, resulting in additional memory usage.

  • High cost of copying, especially when there are many or complex elements

The copying process not only consumes double memory, but also consumes CPU and other resources, which will reduce overall performance.

  • Data consistency issues

Since modifications to CopyOnWrite containers first modify the copy, these modifications are not immediately visible to other threads. The changes are only reflected after the modification is complete. If you want the written data to be visible to other threads immediately, CopyOnWrite containers are not suitable.

Source Code Analysis #

  • Data structure
/** Reentrant lock object */
final transient ReentrantLock lock = new ReentrantLock();

/** CopyOnWriteArrayList is backed by an array, modified with `volatile` to ensure array visibility */
private transient volatile Object[] array;

/**
* Get the array
*/
final Object[] getArray() {
    return array;
}

/**
* Set the array
*/
final void setArray(Object[] a) {
    array = a;
}

/**
* Initialize CopyOnWriteArrayList, which is equivalent to initializing the array
*/
public CopyOnWriteArrayList() {
    setArray(new Object[0]);
}

In this class, there is a ReentrantLock lock used to ensure thread-safe modification operations. The Object[] array named as array is modified with volatile to ensure array visibility. This array is used to store elements, and it can be seen from the getArray(), setArray(), and the constructor of CopyOnWriteArrayList that CopyOnWriteArrayList is implemented using an array, which is consistent with its name.

  • add method
public boolean add(E e) {
    // Lock
    final ReentrantLock lock = this.lock;
    lock.lock();
    try {
        // Get the length and elements of the original array
        Object[] elements = getArray();
        int len = elements.length;
        // Create a new array by copying the original array
        Object[] newElements = Arrays.copyOf(elements, len + 1);
        // Add the new element to the new array
        newElements[len] = e;
        // Replace the reference of volatile Object[] array with the new array
        setArray(newElements);
        return true;
    } finally {
        lock.unlock();
    }
}

The add method is used to add elements to CopyOnWriteArrayList, which is a modification operation. First, a lock is obtained using the lock method of ReentrantLock. After acquiring the lock, the length and elements of the original array are obtained using the getArray method. Then, a new array is created by copying the original array using the Arrays.copyOf method, which creates a new array with the same content as the original array. The new element is added to the new array. After completing the addition, the reference of the volatile Object[] array is replaced with the new array using the setArray operation. Finally, the lock is released in the finally block.

Summary: When adding elements, first lock, then copy a new array, perform the addition operation on the new array, and then replace the reference of array with the new array. Finally, unlock.

The above steps implement the CopyOnWrite idea: the write operation is performed on the copied container, and the list is not locked when reading data. It can be seen that if a new read thread enters during the process of copying the container, it will still read the old data because the object reference has not been changed at that time.

Next, let’s analyze the code of the read operation, which includes two overloaded methods of get and the getArray method. The code is as follows:

public E get(int index) {
    return get(getArray(), index);
}

final Object[] getArray() {
    return array;
}

private E get(Object[] a, int index) {
    return (E) a[index];
}

It can be seen that the get-related operations are not locked, ensuring fast reading operations.

  • Iterator COWIterator class

This iterator has two important properties: Object[] snapshot and int cursor. snapshot represents the snapshot of the array at the time the iterator is created, and cursor represents the iterator’s cursor. The constructor of the iterator is as follows:

private COWIterator(Object[] elements, int initialCursor) {
    cursor = initialCursor;
    snapshot = elements;
}

It can be seen that when the iterator is constructed, the elements at that time is assigned to snapshot. All subsequent operations of the iterator are based on the snapshot array. For example:

public E next() {
    if (!hasNext())
        throw new NoSuchElementException();
    return (E) snapshot[cursor++];
}

In the next method, it can be seen that the returned content is from the snapshot object. Therefore, even if the original array is modified later, the snapshot will neither be aware of nor affected by those changes. The iterator does not require locking for iteration operations and will not throw exceptions due to concurrent modifications. The results returned by the iterator are consistent with the contents at the time of iterator creation.

Above, we have introduced CopyOnWriteArrayList. We have described the characteristics of Vector and Collections.synchronizedList() before its birth, the applicable scenarios and read/write rules of CopyOnWriteArrayList, and the two characteristics of write-on-copy and allowing modification during iteration. We also mentioned three drawbacks: memory consumption, high cost of copying in cases of many or complex elements, and data consistency issues. Finally, we analyzed its important source code.