05 Array List or Linked List Using the Wrong One Could Mean Performance a Thousand Times Worse

05 ArrayList or LinkedList Using the wrong one could mean performance a thousand times worse #

Hello, I’m Liu Chao.

As a container for storing data, collections are one of the most frequently used object types in our daily development. The JDK provides developers with a series of collection types, which are implemented using different data structures. Therefore, different collection types have different use cases.

Many students are often asked collection-related questions in interviews, with the most common being the difference between ArrayList and LinkedList.

I believe that most students can answer: “ArrayList is implemented based on an array, while LinkedList is implemented based on a linked list.”

When it comes to answering the use cases, I found that most students’ answers are: “LinkedList is more efficient than ArrayList when adding or deleting elements, while ArrayList is more efficient than LinkedList when traversing.” Is this answer accurate? In today’s lesson, I will take you through the verification process.

Introduction to the List Interface #

Before we start learning about the List collection class, let’s first take a look at the interface and class implementation relationship of the List collection class through this figure:

img

We can see that the ArrayList, Vector, and LinkedList collection classes inherit the AbstractList abstract class, while the AbstractList implements the List interface and also inherits the AbstractCollection abstract class. ArrayList, Vector, and LinkedList then implement their respective functionalities based on their own positioning.

ArrayList and Vector are implemented using arrays, and the implementation principles of these two are similar, while LinkedList is implemented using doubly linked lists. That’s the basic foundation, next, we will analyze in detail the source code implementation of ArrayList and LinkedList.

How is ArrayList implemented? #

ArrayList is commonly used. Let’s first have a few test questions to check your understanding of ArrayList.

Question 1: When we view the source code of the ArrayList implementation class, we notice that the object array elementData is marked with the transient modifier. We know that the transient keyword means that the property will not be serialized. However, we don’t see any documentation stating that ArrayList cannot be serialized. Why is that?

Question 2: When we use ArrayList for adding and deleting elements, we are often reminded that “using ArrayList for add and remove operations will affect efficiency”. Does that mean that ArrayList is always slower in scenarios where a large number of elements are added?

Question 3: If you were to use a for loop or an enhanced for loop to iterate over an ArrayList, which method would you choose? Why?

If you don’t have a comprehensive understanding of these test questions, let’s reacquaint ourselves with ArrayList from the perspective of data structure, implementation principles, and source code.

1. ArrayList Implementation Class #

ArrayList implements the List interface and extends the AbstractList abstract class. It is implemented using an array as the underlying data structure and supports automatic resizing of the array.

ArrayList also implements the Cloneable interface and the Serializable interface, which allows it to be cloned and serialized.

ArrayList also implements the RandomAccess interface. You may be unfamiliar with this interface and not know its specific use. By looking at the code, we can see that this interface is actually an empty interface with no implementation. Why does ArrayList need to implement it?

In fact, the RandomAccess interface is a marker interface that signifies “any List class that implements this interface can achieve fast random access”.

public class ArrayList<E> extends AbstractList<E>
            implements List<E>, RandomAccess, Cloneable, java.io.Serializable

2. ArrayList Properties #

The main properties of ArrayList are the array length size, the object array elementData, and the default capacity default_capacity, which is initialized to 10.

// Default initial capacity
private static final int DEFAULT_CAPACITY = 10;
// Object array
transient Object[] elementData;
// Array length
private int size;

Looking at the properties of ArrayList, we can see that it is not decorated with any multi-threading keywords, but elementData is decorated with the transient keyword. This brings us back to the first test question: when a field is decorated with the transient keyword, it means that the property will not be serialized. However, ArrayList does implement the Serializable interface, so what’s going on?

This all goes back to the fact that “ArrayList is based on an array implementation.” Since the ArrayList’s array is dynamically expanded, not all allocated memory spaces store data.

If the external serialization method is used to serialize the array, the entire array will be serialized. To avoid serializing the memory spaces that do not store data, ArrayList provides two private methods, writeObject and readObject, to perform serialization and deserialization internally, thus saving space and time when serializing and deserializing the array.

Therefore, using the transient modifier to decorate the array prevents it from being serialized by other external methods.

3. ArrayList Constructors #

The ArrayList class implements three constructors. The first constructor creates an ArrayList object with an initial value. The second constructor creates an empty ArrayList object by default. The third constructor initializes the ArrayList with a collection.

When adding elements to an ArrayList, if the size of the stored elements exceeds the current capacity, the ArrayList will calculate the size of the elements and then dynamically increase the capacity. Increasing the capacity of the array will cause the entire array to be copied to a new memory location. Therefore, when initializing an ArrayList, it is helpful to specify the initial size of the array through the first constructor. This can reduce the number of times the array needs to be resized, thus improving system performance.

public ArrayList(int initialCapacity) {
    // If the initial capacity is greater than zero, create an array of the specified size
    if (initialCapacity > 0) {
        this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {// If the initial capacity is zero, use the default empty array
        this.elementData = EMPTY_ELEMENTDATA;
    } else {
        throw new IllegalArgumentException("Illegal Capacity: " + initialCapacity);
    }
}

public ArrayList() {
    // Initialize with the default empty array
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

4. Adding Elements to ArrayList #

ArrayList has two methods for adding elements: one adds the element to the end of the array, and the other adds the element at any position.

public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    // Add the element to the end of the array
    // ...
}
elementData[size++] = e;
return true;
public void add(int index, E element) {
    rangeCheckForAdd(index);

    ensureCapacityInternal(size + 1);  // Increments modCount!!
    System.arraycopy(elementData, index, elementData, index + 1,
                     size - index);
    elementData[index] = element;
    size++;
}

The two methods have a similar process before adding an element. They both check the capacity before adding. If the capacity is sufficient, no resizing is required. If the capacity is not sufficient, the array is resized to 1.5 times the original size, and the elements are copied to a new memory address after resizing.

private void ensureExplicitCapacity(int minCapacity) {
    modCount++;

    // overflow-conscious code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}

Of course, there are some differences between the two methods. Adding an element to any position requires rearranging all elements that come after the insertion point. However, if an element is added to the end of the array without resizing, there is no need to copy and rearrange elements.

Now you can find the answer to the second test question. If we know the size of the data to be stored at initialization, we can specify the capacity of the ArrayList at initialization. When adding elements, we can add them only to the end of the array, without rearranging any existing elements. In scenarios with a large number of new elements, the performance of ArrayList will not be worse and may even be better than other List collections.

5. Removing Elements from ArrayList #

The removal method of ArrayList is somewhat similar to the method of adding elements at any position. After each valid removal operation of ArrayList, the array needs to be restructured. The closer to the beginning the element is removed, the greater the cost of array restructuring.

public E remove(int index) {
    rangeCheck(index);

    modCount++;
    E oldValue = elementData(index);

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work

    return oldValue;
}

6. Traverse Elements in ArrayList #

Since ArrayList is based on an array, getting elements is very efficient.

public E get(int index) {
    rangeCheck(index);

    return elementData(index);
}

E elementData(int index) {
    return (E) elementData[index];
}

How is LinkedList implemented? #

Although LinkedList and ArrayList are both collections of type List, the implementation of LinkedList is quite different from ArrayList, and they have different use cases.

LinkedList is implemented based on the data structure of a doubly-linked list. LinkedList defines a Node structure, which contains 3 components: the element content item, the previous pointer prev, and the next pointer next. The code is as follows.

private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;

    Node(Node<E> prev, E element, Node<E> next) {
        this.item = element;
        this.next = next;
        this.prev = prev;
    }
}

To summarize, LinkedList is a doubly-linked list formed by connecting Node structure objects. Before JDK 1.7, LinkedList only contained a header attribute of type Entry, and when initialized, it created a default empty Entry to serve as the header, with the previous and next pointers pointing to itself, forming a circular doubly-linked list.

Since JDK 1.7, there have been significant changes in LinkedList, with optimizations made to the linked list. The Entry structure of the linked list was replaced with Node, and the internal composition remained mostly unchanged. However, the header attribute in LinkedList was removed and replaced with a first attribute of type Node and a last attribute of type Node. This has several advantages:

  • The first/last attribute can more clearly express the concepts of the head and tail of the linked list.
  • The first/last approach can save the need for creating a new Entry when initializing LinkedList.
  • The most important performance optimization of the first/last approach is that inserting and deleting operations at the head and tail of the linked list become faster.

Just like in the explanation of ArrayList, I will take you to understand LinkedList in depth from the perspectives of data structure, implementation principles, and source code analysis.

1. LinkedList Implementation Class #

The LinkedList class implements the List and Deque interfaces, and also inherits the AbstractSequentialList abstract class. LinkedList combines the characteristics of both List and Queue. LinkedList also implements the Cloneable and Serializable interfaces, enabling cloning and serialization, just like ArrayList.

Since the memory addresses where LinkedList stores data are not contiguous but located using pointers, LinkedList does not support random access, and therefore cannot implement the RandomAccess interface.

public class LinkedList<E>
    extends AbstractSequentialList<E>
    implements List<E>, Deque<E>, Cloneable, java.io.Serializable

2. LinkedList Attributes #

We mentioned earlier that LinkedList has two important attributes, first/last, and there is also a size attribute. These three attributes are all marked as transient. The reason is simple: when serializing, we don’t only serialize the head and tail, so LinkedList implements its own readObject and writeObject methods for serialization and deserialization.

transient int size = 0;
transient Node<E> first;
transient Node<E> last;

3. Adding Elements to LinkedList #

The implementation of adding elements to LinkedList is concise, but there are many ways to add elements. The default add(E e) method adds the element to the end of the queue. First, the last element is stored in a temporary variable, then a new Node object is created, with the last reference pointing to the new node object, and the previous pointer of the previous last object pointing to the new node object.

public boolean add(E e) {
    linkLast(e);
    return true;
}

void linkLast(E e) {
    final Node<E> l = last;
    final Node<E> newNode = new Node<>(l, e, null);
    last = newNode;
    if (l == null)
        first = newNode;
    else
        l.next = newNode;
    size++;
    modCount++;
}

LinkedList also has methods to add elements at any position. If we are adding an element between any two elements, the operation of adding elements only changes the previous and next pointers of the previous and next elements, which will point to the newly added element. Compared to the adding operation of ArrayList, LinkedList has a clear advantage in performance.

public void add(int index, E element) {
    checkPositionIndex(index);

    if (index == size)
        linkLast(element);
    else
        linkBefore(element, node(index));
}

void linkBefore(E e, Node<E> succ) {
    final Node<E> pred = succ.prev;
    final Node<E> newNode = new Node<>(pred, e, succ);
    succ.prev = newNode;
    if (pred == null)
        first = newNode;
    else
        pred.next = newNode;
    size++;
    modCount++;
}

4. Deleting Elements from LinkedList #

In the operation of deleting elements from LinkedList, we first need to find the element through a loop. If the position of the element to be deleted is in the first half of the list, we search from the beginning to the end; if the position is in the second half, we search from the end to the beginning.

Doing this ensures that removing elements near the beginning or end is very efficient. However, if the list has a large number of elements and the element to be removed is in the middle, the efficiency will be relatively low.

5. Traversing Elements in LinkedList #

The implementation of getting elements in LinkedList is similar to the deletion operation. By searching the first and second halves in a loop, we can find the corresponding element. However, this way of searching for elements is very inefficient, especially in the case of for loop traversal, where we have to traverse half of the list for each iteration.

Therefore, when traversing LinkedList, we can use the iterator to directly obtain the elements without searching through the list.

Summary #

Previously, we have delved into the implementation principles and respective characteristics of ArrayList and LinkedList from the perspective of source code. If you can fully understand these contents, many performance-related issues in practical applications will be easily solved.

Just like if someone says to you now, “When adding or deleting elements, LinkedList is more efficient than ArrayList, while when traversing, ArrayList is more efficient than LinkedList,” would you still agree?

Now, let us verify this through several sets of tests. Due to space constraints, I will directly provide the test results here. You can visit Github to view and download the corresponding test code.

1. Adding Elements in ArrayList and LinkedList

  • Adding elements at the head of the collection
  • Adding elements at the middle of the collection
  • Adding elements at the tail of the collection

Test Results (Time taken):

  • ArrayList > LinkedList
  • ArrayList
  • ArrayList

Through this set of tests, we can see that the efficiency of adding elements in LinkedList does not necessarily surpass ArrayList.

Since ArrayList is implemented using an array, and an array is a continuous block of memory, when adding an element to the head of the array, the data after the head needs to be copied and rearranged, so the efficiency is low. On the other hand, LinkedList is implemented based on a linked list. When adding an element, it first searches for the position to add by looping. If the position is in the front half of the list, it searches from the beginning, while if it is in the back half, it searches from the end. Therefore, adding an element to the head of LinkedList is very efficient.

Similarly, we can deduce that when adding elements to the middle of the array, a portion of the data in ArrayList needs to be copied and rearranged, so the efficiency is not very high. Adding elements to the middle position in LinkedList is the least efficient because it requires the most element traversals before adding the element.

In adding elements at the tail, we found that without resizing, ArrayList is more efficient than LinkedList. This is because when adding an element to the tail of ArrayList, there is no need to copy or rearrange the data, so the efficiency is very high. Although LinkedList does not require element traversal, it involves the creation of new objects and the process of changing pointers to objects, so its efficiency is lower than that of ArrayList.

I want to clarify that here I conducted the tests based on ArrayList with sufficient initial capacity, excluding the case of dynamically resizing the array. If dynamic resizing is involved, the efficiency of ArrayList will also be reduced.

2. Deleting Elements in ArrayList and LinkedList

  • Deleting elements from the head of the collection
  • Deleting elements from the middle of the collection
  • Deleting elements from the tail of the collection

Test Results (Time taken):

  • ArrayList > LinkedList
  • ArrayList
  • ArrayList

The results of the tests on deleting elements in ArrayList and LinkedList are similar to those for adding elements. The principle is the same, so I will not repeat the explanation here.

3. Traversing Elements in ArrayList and LinkedList

  • Using a for(;;) loop
  • Using an iterator for looping

Test Results (Time taken):

  • ArrayList
  • ArrayList ≈ LinkedList

We can see that the performance of the for loop in LinkedList is the worst, while the for loop in ArrayList performs the best.

This is because LinkedList is implemented based on a linked list. When using a for loop, each iteration will traverse half of the list, which severely affects the traversal efficiency. On the other hand, ArrayList is implemented based on an array and implements the RandomAccess interface, which means that ArrayList can achieve fast random access, so the efficiency of the for loop is very high.

The performance of iterating and looping in LinkedList is comparable to that of ArrayList, so when traversing LinkedList, we should avoid using a for loop.

Thought Question #

We will think through some issues to consider when performing delete operations on an ArrayList array using a for loop.

public static void main(String[] args)
{
    ArrayList<String> list = new ArrayList<String>();
    list.add("a");
    list.add("a");
    list.add("b");
    list.add("b");
    list.add("c");
    list.add("c");
    remove(list);// Remove the specified element "b"

    for(int i=0; i<list.size(); i++) 
    {
        String s = list.get(i);
        System.out.println("element : " + s);
    }
}

In the above code, I have defined an ArrayList array and added some elements to it. Then, I removed the specified element using the remove method. Which of the following approaches is correct?

Approach 1:

public static void remove(ArrayList<String> list) 
{
    Iterator<String> it = list.iterator();
    
    while (it.hasNext()) {
        String str = it.next();
        
        if (str.equals("b")) {
            it.remove();
        }
    }
}

Approach 2:

public static void remove(ArrayList<String> list) 
{
    for (String s : list)
    {
        if (s.equals("b")) 
        {
            list.remove(s);
        }
    }
}

Note: The translation above maintains the formatting as it was in the original Chinese content.