08 Comparing the Differences Between Vector, Array List, and Linked List
  • Vector:Vector 是 Java 中最早的一种动态数组实现,它是线程安全的,支持自动扩容。它的底层实现是使用数组,适合随机访问和遍历操作。然而,由于其线程安全性以及锁的使用,Vector 的性能相对较低。

  • ArrayList:ArrayList 是 Vector 的非线程安全版本,也是最常用的动态数组实现。和 Vector 一样,它也是基于数组的,支持自动扩容。ArrayList 的访问和读取性能强于 Vector,因为它没有线程安全的开销。然而,当需要频繁地插入或删除元素时,由于需要进行数组的复制和移动,性能较差。

  • LinkedList:LinkedList 是一种双向链表的实现方式。它的插入和删除操作非常高效,因为只需要修改相邻节点之间的引用指向即可。然而,由于需要维护节点之间的引用关系,LinkedList 的访问和读取性能较差。此外,LinkedList 不支持随机访问,需要从头节点开始遍历查找元素。

综上所述,Vector 适用于需要线程安全的场景,ArrayList 适用于单线程环境下对数组的频繁读取和遍历操作,LinkedList 适用于频繁的插入和删除操作。在实际应用中,我们需要根据具体场景选择最合适的集合类。

08 Comparing the differences between Vector, ArrayList, and LinkedList #

All three of them are implementations of the List interface in the collection framework, which is a type of ordered collection. Therefore, they have similar functionalities, such as providing operations for locating, adding, and removing elements by position, as well as providing iterators to traverse their contents. However, due to design differences, they have significant differences in behavior, performance, and thread safety.

Vector is an early thread-safe dynamic array provided by Java. If thread safety is not required, it is not recommended to use it, as synchronization has additional overhead. Vector internally uses an object array to store data, and it can automatically increase its capacity as needed. When the array is full, a new array is created and the original array data is copied.

ArrayList is a more widely used implementation of a dynamic array. It is not thread-safe by itself, so it has better performance. Similar to Vector, ArrayList can also adjust its capacity as needed, but there are differences in the logic of capacity adjustment. Vector increases its capacity by doubling, while ArrayList increases by 50%.

As the name suggests, LinkedList is a doubly-linked list provided by Java, so it does not need to adjust its capacity like the previous two. It is also not thread-safe.

Examination Point Analysis #

It seems that since I started working with Java, this question has always been a classic interview question. In my previous answers, I covered some basic designs and implementations of the three.

In general, we can also supplement with the suitable scenarios for different container types:

  • Vector and ArrayList are used as dynamic arrays. Their internal elements are stored in array form in sequential order, so they are very suitable for scenarios with random access. However, their performance is relatively poor other than inserting and deleting elements at the end. For example, if we insert an element at a middle position, we need to move all subsequent elements.
  • On the other hand, LinkedList is much more efficient for node insertion and deletion, but its performance for random access is slower than that of dynamic arrays.

Therefore, in application development, if we can estimate in advance whether the application operations are more inclined towards insertion, deletion, or random access, we can make choices accordingly. This is also the most common aspect of assessment in interviews: given a scenario, choose the appropriate data structure. Therefore, it is important to have a clear understanding of these typical choices.

When it comes to examining the Java collection framework, I think there are many aspects that need to be mastered:

  • The design structure of the Java collection framework, at least a general impression.
  • The main types of containers (collections and maps) provided by Java, understanding or mastering the corresponding data structures and algorithms, and considering specific technical choices.
  • Expanding the scope to include performance, concurrency, and other areas.
  • The evolution and development of the collection framework.

As a Java column, I will try to expand on Java-related topics as much as possible. Otherwise, just listing the data structures involved in the collection will take up a lot of space. This does not mean that these topics are not important. Data structures and algorithms are fundamental skills and are often unavoidable points of assessment. Some companies are even well-known (or rather infamous) for focusing on these aspects in their interviews. Taking the typical sorting algorithms that you need to master as an example, you should be familiar with:

  • Internal sorting: at least the basic algorithms like merge sort, exchange sort (bubble sort, quicksort), selection sort, and insertion sort.
  • External sorting: understanding the process and approach to processing very large datasets using memory and external storage.

When assessing algorithms, it is not just about how to simply implement them. Interviewers often go deeper and ask questions like which sorts are unstable (quicksort, heapsort), or what stability implies. They may also ask about the best or worst cases for various sorting algorithms on different datasets, or how to further optimize from a certain perspective (e.g. if the business scenario requires minimal auxiliary space, heapsort outperforms mergesort in that aspect). From a basic understanding to further thinking, interviewers usually observe the interviewee’s approach to problem-solving and communication.

The above is just an example from one aspect. I recommend studying relevant books, such as “Introduction to Algorithms” and “Programming Pearls,” or relevant tutorials like this one. For specific fields, such as recommendation systems, it is advisable to consult domain experts. From the perspective of interviews alone, many people recommend using algorithm websites like LeetCode to help with review and interview preparation, but to be honest, I have not personally used these algorithm questions. This is a matter of personal preference. When recruiting, I prefer to assess what the interviewee is best at, to avoid hiring someone who is only good at interviewing.

Knowledge Expansion #

Let’s start by understanding the overall design of the collection framework. To give you an intuitive impression, I’ve drawn a brief class diagram. Please note that I haven’t included the thread-safe containers under java.util.concurrent to avoid confusion. I also haven’t listed the Map container, although conceptually it is also considered part of the collection framework, it is not a true Collection itself.

So, today I will mainly focus on the narrow sense of the collection framework. The others will be covered in future articles.

We can see that Java’s collection framework has the Collection interface as the root of all collections. It then extends to provide three major types of collections:

  • List, which is the most commonly discussed ordered collection, providing convenient operations for accessing, inserting, deleting, etc.
  • Set, a collection that does not allow duplicate elements. This is the most obvious difference compared to List, meaning there are no two objects for which the equals method returns true. In everyday development, we often need to ensure the uniqueness of elements.
  • Queue/Deque, which are standard queue structures provided by Java. In addition to the basic functionalities of collections, they also support specific behaviors such as First-In-First-Out (FIFO) or Last-In-First-Out (LIFO). This does not include BlockingQueue, as it is usually used in concurrent programming and therefore is placed in the concurrent package.

The common logic of each collection is abstracted into the corresponding abstract classes. For example, AbstractList centralizes the common parts of various List operations. These collections are not completely isolated. For example, LinkedList itself is both a List and a Deque.

If you have read more source code, you will find that actually, the TreeSet code is implemented using TreeMap by default. The Java library creates a Dummy object “PRESENT” as the value, and all inserted elements are actually put into the TreeMap as keys. Similarly, HashSet is also implemented based on HashMap. So, it turns out that they are just disguises of Map class!

As mentioned earlier, we need to be familiar with the basic characteristics and typical use cases of various specific collection implementations. Let’s take a few implementations of Set as examples:

  • TreeSet supports natural-order access, but the operations like adding, deleting, and containing elements are relatively inefficient (log(n) time complexity).
  • HashSet uses hashing algorithm. In ideal scenarios, if the hashing is normal, it can provide constant-time operations like adding, deleting, and containing elements. However, it does not guarantee order.
  • LinkedHashSet internally constructs a doubly linked list that maintains the insertion order. Therefore, it provides the ability to traverse in insertion order. At the same time, it also guarantees constant-time operations like adding, deleting, and containing elements. The performance of these operations is slightly lower than that of HashSet due to the cost of maintaining the linked list.
  • When traversing elements, the performance of HashSet is influenced by its capacity. Therefore, when initializing, unless necessary, it is not recommended to set the capacity of the underlying HashMap too large. As for LinkedHashSet, due to the convenience provided by its internal linked list, the traversal performance only depends on the number of elements.

The collection classes I introduced today are not thread-safe. I will cover the thread-safe containers in java.util.concurrent in future articles. However, this doesn’t mean that these collections cannot be used in concurrent programming scenarios. The Collections utility class provides a series of synchronized methods, such as:

static <T> List<T> synchronizedList(List<T> list)

We can utilize similar methods to implement basic thread-safe collections:

List list = Collections.synchronizedList(new ArrayList());

Its implementation is basically to add basic synchronization support to each basic method, such as get, set, add, etc., through “synchronized”. It is very simple and crude, but also very practical. Note that the thread-safe collections created by these methods comply with the fail-fast behavior during iteration. When unexpected concurrent modifications occur, a ConcurrentModificationException is thrown as soon as possible to avoid unpredictable behavior.

Another frequently asked question is to understand the default sorting algorithms provided by Java, such as the specific sorting method and the design ideas.

This question itself is a bit tricky because it needs to differentiate between Arrays.sort() and Collections.sort() (which calls Arrays.sort() internally); what data type is used; how large is the dataset (for too small datasets, complex sorting is unnecessary and Java will directly perform binary insertion sort), etc.

  • For primitive data types, the currently used algorithm is called Dual-Pivot QuickSort, which is an improved version of QuickSort. The early version of this algorithm was relatively traditional QuickSort. You can read the source code here.
  • For object data types, TimSort is currently used, which is an optimized sorting algorithm that combines merge sort and binary insertion sort. TimSort is not exclusive to Java. In simple terms, its idea is to find the already sorted partitions (called “runs”) in the dataset and then merge these partitions to achieve the sorting purpose.

In addition, Java 8 introduced parallel sorting algorithms (using the parallelSort method) to fully utilize the computing power of modern multi-core processors. The underlying implementation is based on the fork-join framework (which will be discussed in more detail in later columns). When dealing with small datasets, the difference is not significant and may even perform slightly worse. However, when the dataset grows to tens of thousands or millions of items, the improvement is very significant, depending on the processor and system environment.

Sorting algorithms are still being continuously improved. Recently, the author of the Dual-Pivot QuickSort implementation has submitted a further improvement. After years of research, it is currently undergoing review and validation. According to the author’s performance comparison tests, compared to the merge sort-based implementation, the new improvement can increase the speed of sorting random data by 10% to 20%, and even have several times improvement on datasets with other characteristics. If you are interested, you can refer to the specific code and introduction here.

In Java 8, the Java platform introduced Lambda and Stream, and the corresponding Java collection framework has also been greatly enhanced to support methods that create corresponding streams or parallelStreams for collections. We can easily implement functional code.

When reading the Java source code, you will find that the design and implementation of these APIs are quite unique. They are not implemented in abstract classes, but in the form of default methods implemented in interfaces such as Collection! This is a new feature in the Java 8 language that allows interfaces to have default methods. Theoretically, most of the methods that were originally implemented in utility classes like Collections can be converted to the corresponding interfaces. Regarding this point, I will specifically discuss the evolution of the basic mechanisms of the Java language in the object-oriented theme.

In Java 9, the Java standard class library provides a series of static factory methods, such as List.of(), Set.of(), which greatly simplify the code for building small container instances. Based on industry practice, we have found that quite a number of collection instances have very limited capacity and will not be modified during their lifetimes. However, in the original Java class library, we may have to write code like this:

ArrayList<String> list = new ArrayList<>();
list.add("Hello");
list.add("World");

But with the new container static factory methods, just one line of code is enough and immutability is guaranteed:

List<String> simpleList = List.of("Hello","world");

Furthermore, the instances created by various “of” static factory methods apply some of our so-called best practices. For example, they are immutable, which meets our needs for thread safety. Also, because they do not need to consider capacity expansion, they are more compact in terms of space.

If we were to look at the source code of the “of” method, we would find something interesting: We know that Java already supports so-called variable arguments (varargs). However, the official class library still provides a series of methods with specific parameter lengths, which seems very inelegant. Why is that? This is actually for optimal performance. JVM incurs obvious additional overhead when dealing with variable arguments. If you need to implement performance-sensitive APIs, you can also refer to this.

Today, I started with Vector, ArrayList, and LinkedList, and gradually analyzed their design implementations, differences, and suitable application scenarios. I also made some simple summaries of the collection framework, from basic algorithms to API design and implementation improvements. I hope it will be helpful for your daily development and API design.

Practice Exercise #

Do you have a clear understanding of the topic we discussed today? I have a question for you to ponder. First, think about an application scenario, such as the need to implement a cloud computing task scheduling system to ensure that VIP customers’ tasks are processed with priority. What data structures or standard collection types can you use? Furthermore, what data structure is commonly used in similar scenarios?

Please write your thoughts on this question in the comments section. I will select thoughtful comments and give you a learning encouragement reward. Feel free to discuss it with me.

Are your friends also preparing for interviews? You can “involve friends” by sharing today’s question with them. Perhaps you can help them.