13 How Js Engine Implements Stable Sorting of Arrays

13 How JS Engine Implements Stable Sorting of Arrays #

Hello, I am Ishikawa.

We often talk about data structures and algorithms together, but have you ever thought about the relationship between the two?

It can be said that data structures serve algorithms. This statement is quite general. Next, let’s find a starting point to explain the relationship between the two. Let’s start with sorting algorithms. When it comes to sorting, we have to mention the commonly used data structure - arrays. For example, in JavaScript, we usually use its sorting method. But do you know which algorithm is used behind the sorting method? In this lecture, let’s understand sorting algorithms through arrays and then see how the JS engine implements sorting of arrays.

In data structures, we can generally divide them into two categories. The first category is linear list, and the second category is non-linear list. The “linear” here is not fundamentally different from our understanding of linear in everyday life, which means “in order”. Arrays can be said to be a kind of linear list with contiguous storage, and they can be considered as a basic linear data structure. Through arrays, we can implement linear structures such as stacks and queues mentioned before. You can also use it to create very classic non-linear structures, such as heaps and graphs.

In addition to arrays, another basic and classic data structure is objects. One characteristic of objects is that they contain properties, which can often be used as nodes. By connecting nodes, we can create linked structures, such as linked lists. Another characteristic of objects is that they support key-value pairs, which can be used for implementing hash tables and dictionaries.

Returning to today’s topic, let’s focus on arrays.

What are the advantages and disadvantages of arrays #

The biggest advantage of arrays as a linear data structure is their contiguous nature, which provides the benefit of random access. However, their disadvantage lies in the efficiency of insertion and deletion. Almost all languages provide arrays as a built-in data type, and JavaScript is no exception. We have already discussed arrays as one of the data types in JavaScript. But as a data structure, we can also understand it from another perspective.

As mentioned earlier, the disadvantage of arrays is insertion and deletion. What does this mean? Let’s assume that someone cuts in line while waiting in a queue, causing everyone in the group to move back. In this scenario, the time complexity is \(O(n)\).

So how can we overcome this disadvantage when using arrays? Similar to a queue, the solution is to follow the rules and allow the person to join the line at the end. In an array, if we add a data element to the end of the queue, the overall time complexity is only \(O(1)\). Therefore, it is usually best to add new data to the end if the order is not important. In JavaScript, how can we add a piece of information to the end of an array? We know that arrays are also objects and come with their own properties and methods for manipulating arrays. In JavaScript, the simplest way to insert is to calculate the next empty space based on the length of the array and assign the value to the desired data element.

var week = [1,2,3,4,5,6];
week[week.length] = 7;
console.log(week) // returns [1,2,3,4,5,6,7]

Here, you can see both similarities and differences compared to C or Java. Similar to C or Java, our intuition tells us that the array length is 6 initially, so why assign a value to the 6th position, which appears to be an empty space? Won’t it overwrite the 6th number? No, it won’t. This is because arrays are typically indexed from 0 instead of 1. The difference from C or Java is that in JavaScript, the length of an array is dynamically expandable, or in functional programming terms, the content is mutable. This can be seen from the example above. In C and Java, on the other hand, the length of an array is determined at the beginning, so if a new data element is added at this point, a new array needs to be created.

In addition to the method mentioned above, there is another way to quickly add a value to the end of an array, and that is by using the built-in push method in JavaScript. From this, we can also see the shadows of queues and stacks. When we use the queue example mentioned earlier, it actually has the concept of a queue. The push method below in English actually means “pushing onto the stack”.

var week = [1,2,3,4,5,6];
week.push(7);
console.log(week) // returns [1,2,3,4,5,6,7]

How to Sort Arrays in JavaScript #

Now that we’ve discussed the advantages and disadvantages of arrays, let’s take a look at how to sort them. How exactly are arrays sorted? In the following examples, we’ll explore several sorting methods. The first two methods, bubble sort and selection sort, are more theoretical approaches, while the next three methods, insertion sort, quicksort, and mergesort, are more practical algorithms. The V8 engine used in Chrome combines quicksort and insertion sort for sorting, while SpiderMonkey used in Firefox implements sorting using mergesort. Why do browsers use these sorting methods? We can gain an understanding by examining their principles and characteristics.

Image

Theoretical Sorting Methods #

First, let’s take a look at bubble sort and selection sort, which are two theoretical sorting algorithms. Both of these methods employ a technique called compare and swap. As we mentioned earlier, inserting an element in a non-tail position of an array can cause the subsequent elements to shift. However, by using the swap technique, we cleverly avoid this problem.

The core idea behind bubble sort is to compare two adjacent data elements. If the element in the front is greater than the element in the back, we swap their positions.

Image

Selection sort uses an in-place comparison algorithm. Its core idea is to find the smallest data element and place it in the first position, then find the smallest data element in the remaining array and place it in the second position, and so on.

Image

The reason why these two sorting algorithms are considered more theoretical is because their time complexity is \(O(n^{2})\). Now let’s see if there are any other sorting methods with lower time complexity for sorting arrays.

Commonly Used Sorting Methods #

Next, let’s take a look at some commonly used sorting algorithms that are more practical and frequently used in real-world scenarios.

Unlike bubble sort and selection sort, insertion sort does not use the compare and swap method. Instead, it uses a technique called shifting for insertion. It assumes that the first element of the array is already sorted. We compare the second element with the first element, and if it’s smaller, we insert it before the first element. If it’s greater, we leave it in place. Then we compare the third element with the previous two elements and shift it if necessary. We continue this process for the remaining elements. Although insertion sort has a similar time complexity as bubble sort and selection sort, which is \(O(n^{2})\), in practice, it performs better for shorter arrays.

Image

Mergesort is one of the more commonly used practical sorting algorithms, with a time complexity of \(O(n \log n)\). Now let’s take a look at its implementation. Mergesort utilizes the recursion and divide-and-conquer algorithms we discussed earlier. The core of this method is to split a large array into smaller, unsortable sub-arrays consisting of only one element each. Then we compare and merge these sub-arrays until we have a single sorted array.

Image

Next, let’s take a look at quicksort, which, like mergesort, utilizes recursion and divide-and-conquer. However, unlike mergesort, quicksort does not actually split and recombine the elements. So, how does it implement the sorting?

In quicksort, we use a pivot point and two pointers in the array. We add a pointer at the beginning and end of the array, and a pivot point in the middle. We continuously move the left pointer until we find an element larger than the pivot. Similarly, we continuously move the right pointer until we find an element smaller than the pivot, and then we swap it with the element pointed to by the left pointer. This process can ensure that all elements to the left of the pivot are smaller than those to the right. This process is called partitioning. This process is recursively repeated until the entire array is sorted.

Image

When it comes to quicksort, there is one thing worth noting regarding partitioning. For the pivot in the partitioning process, one option is to choose the average value between the two pointers to avoid having a large difference in the number of elements between the two partitions. However, a better approach is to choose a random value to serve as the pivot, as this cannot guarantee the best results but reduces the probability of poor performance.

The Stability of Sorting Algorithms #

After discussing multiple topics, let’s return to our previous question: why does V8 use the insertion-accelerated sorting method while SpiderMonkey uses the merge method? This brings us to the issue of sorting stability. Although the official ECMAScript documentation does not specify the specific implementation of the sort function in JavaScript engines, there is one guideline: ensuring that array sorting must be stable.

In theory, although the time complexity of merge sort and quicksort is the same, we have neglected two important factors. The first is the average and worst-case time complexity, and the second is the issue of space complexity. Quicksort has an average space complexity of \(O(n log n)\), but in worst-case scenarios, it becomes \(O(n^2)\).

You may wonder why not just use merge sort, as it is stable. However, merge sort has its own issue. Although it is stable, it is not an in-place sorting algorithm, resulting in a space complexity of \(O(n)\), which is higher than other types of algorithms. This is why V8 and SpiderMonkey chose different algorithms. For V8, in order to address both stability and complexity concerns, it uses a combination of quicksort and insertion sort. Insertion sort is used when the data size is relatively small, and quicksort is switched to for larger data sizes. On the other hand, SpiderMonkey satisfies the stability requirement by using merge sort.

Extension: Linear Sorting Algorithms #

There is another type of sorting algorithm called linear sorting algorithms. They are named so because they use non-comparative methods for sorting. This category includes counting, bucket, and radix sorting. Since these sorting algorithms are used for specific scenarios, we will not discuss their specific implementations here. However, I will provide the source code on GitHub. If you want to explore further, you can refer to the code comments for understanding.

Summary #

Today, through the use of arrays, the simplest data structure, we have gained an understanding of sorting algorithms, which are not as simple as they seem. You should have learned about the advantages and disadvantages of arrays as a data structure, as well as the sorting methods used in V8 and SpiderMonkey. The most important aspect of this lesson is not just the logic and application that we have learned, but rather the mindset of comprehensive analysis. For example, the time complexity we have seen is only one aspect and cannot directly represent performance. Only when we consider it together with stability and space complexity can we make a more complete and objective judgment about the solution. At the same time, this also tells us that the programs we write may produce different results based on different engines and data sizes.

Thought question #

The thought question I have for you today is, based on the knowledge you’ve learned today, can you talk about what algorithms are used by other JS engines?

I look forward to seeing your sharing in the comments section, let’s exchange and discuss together. Also, feel free to share today’s content with more friends. See you in the next issue!