29 Performance Understanding Garbage Collection Through Orinoco Jank Busters

29 Performance Understanding Garbage Collection through Orinoco Jank Busters #

Hello, I’m Ishikawa.

In the previous two lessons, we learned about performance optimization in JavaScript from a multithreading perspective.

Today, let’s take a look at the garbage collection mechanism in JavaScript, which is related to memory management, as well as the performance optimization algorithms involved.

In fact, in the JS language, garbage collection is automatic, which means it is not something we handle manually during program development. However, understanding it is still helpful for understanding the underlying logic of memory management. Especially when combined with what we have learned in the previous two lessons, in the front-end scenario, when our program uses graphical WebGL+Web Worker multithreading to handle a large amount of computational or rendering work, understanding the memory management mechanism becomes necessary. I want to remind you that this lesson will involve quite a bit of theory and low-level knowledge, so make sure to keep up with the pace of our course.

Idle State and Generational Garbage Collection #

In the previous lecture, we mentioned parallelism and concurrency, and we discussed that in frontend performance metrics, we usually focus on smoothness and responsiveness. Ideally, to achieve a silky smooth experience, we need to achieve 60fps, which means rendering each frame within 16.6ms. In many cases, browsers can complete rendering within 16.6ms, at which point, if rendering is done early, the main thread is usually idle. Chrome typically uses this idle time to perform garbage collection. The following diagram provides a more intuitive view of the execution order of these tasks on the main thread.

In memory management, it is necessary to understand several concepts. In garbage collection, there is a concept called generational garbage collector which involves dividing the memory heap into semi-spaces containing different types of objects, including the young generation and the old generation.

This division is based on the well-known generational hypothesis in the garbage collection field. According to this hypothesis, the young generation contains younger data, most of which have relatively short lifetimes. The data that survives and resides in the old generation typically have longer lifetimes.

Therefore, V8 has two garbage collectors, one minor garbage collector (scavenger) for the young generation and one major garbage collector for the old generation. The minor garbage collector collects short-lived objects in the young generation and moves long-lived objects to the semi-space of the old generation. The young generation space consists of an object area (from-space) and a free area (to-space).

You may wonder why there is a need to divide the young generation into two areas. This is for convenience in data processing. In the object area, data is marked as either live or garbage. Garbage data is cleared, and live data is promoted and organized in the free area. At this point, the free area becomes the object area, and the object area becomes the free area. In other words, without creating new areas, these two areas can be used interchangeably to carry out marking and clearing tasks.

The major garbage collector performs progressive marking on objects in the semi-space of the old generation when the number of objects increases to a certain limit. Typically, when a page is idle for a long period, a full cleanup is performed. The cleanup is done by a dedicated cleanup thread, followed by a compaction process to deal with fragmented memory. Thus, the overall operation flow of the major garbage collector is mark-sweep-compact.

In memory management, especially in garbage collection, the underlying logic is quite simple, and it can be summarized as follows:

How to mark live objects;
Collect and clear non-live objects;
Compact fragmented memory after collection. The major garbage collector is no exception.

First, let’s take a look at the marking process. Marking is the process of finding reachable objects. The garbage collector determines the “liveness” of objects based on reachability. This means that any object that can be accessed internally by the runtime must be retained, while any object that cannot be accessed is eligible for collection. Marking starts with a set of known object pointers such as global objects and currently active functions in the execution stack, known as the root set.

The GC marks the roots as live and recursively discovers more live objects based on pointers, marking them as reachable. Afterwards, all unmarked objects on the heap are considered unreachable and eligible for collection. From a data structure and algorithm perspective, marking can be seen as traversing a graph, where objects on the heap are graph nodes, and pointers from one object to another are graph edges. Given a node in the graph, we can access all outgoing edges of that node using the hidden class of the object.

After marking, the sweeping process occurs where the GC identifies the contiguous gaps left by unreachable objects and adds them to a data structure called the free list. The free list is divided based on the size of memory blocks to enable quick lookup. When we want to allocate memory in the future, we can simply look at the free list and find an appropriate-sized memory block.

The next step after sweeping is compacting. You can think of this as defragmenting a hard drive on a computer, where live objects are copied to other uncompressed memory pages (using the free list of memory pages). By doing so, small and scattered gaps left by non-live objects in memory can be utilized, optimizing available space in memory.

The Orinoco project of V8 was established in order to continuously improve the performance of the garbage collector. Its goal is to enhance browser smoothness and responsiveness by reducing jank. In the optimization process, V8 uses concurrency and parallelism in the minor garbage collector. Now let’s take a closer look at their principles and implementations.

Parallelism in the Minor Garbage Collector #

First, let’s take a look at the parallelism used in the minor garbage collector (Scavenger). Parallel garbage collection, as the name suggests, means that the garbage collection work is done in parallel between multiple threads. Compared to concurrency, it is easier to handle because the main thread’s work is completely paused during garbage collection (stop the world).

V8 uses parallel garbage collection to distribute work among worker threads. Each thread is assigned a certain number of pointers, and the thread will evacuate live objects to the object space based on those pointers. Because different tasks may find the same object through different paths and perform evacuation and movement operations, these tasks avoid race conditions through atomic read, write, compare, and exchange operations. The thread that successfully moves the object will update the pointers for other threads to reference and update.

In the early stages, V8 used a single-threaded Cheney’s semispace copying algorithm. Later, it was changed to multi-threaded. Like the single-threaded algorithm, the collection work is mainly divided into three steps: scanning roots, copying in the young generation, promoting to the old generation, and updating pointers.

These three steps are interleaved. This is a GC similar to Halstead’s semispace copying collector, with the difference that V8 uses dynamic work stealing and a relatively simple load balancing mechanism for root scanning.

During this period, V8 also tried an algorithm called Mark Evacuate algorithm. The main advantage of this algorithm is that it can leverage the existing Mark Sweep Compact collector in V8 as a foundation for parallel processing.

The parallel processing here consists of three steps: first, marking the young generation; after marking, copying the surviving objects to the object space; and finally, updating the pointers between objects.

Although this is multi-threaded, they are completed in a lock step manner, which means that although these three steps can be executed in parallel on different threads, threads must be synchronized and proceed to the next phase. Therefore, its performance is lower than the Scavenger parallel algorithm mentioned earlier, which is completed in an interleaved manner.

Concurrency in the Main Garbage Collector #

Concurrent Marking #

After discussing the parallel garbage collection in the auxiliary collector, let’s now take a look at the concurrent marking used in the main collector. In situations where the main thread is particularly busy, the marking work can be performed independently on multiple worker threads. However, since the main thread is still executing the program during this time, its processing is more complex compared to parallel processing.

Before discussing concurrent marking, let’s first see how the marking work can be executed simultaneously on different threads. In this case, objects are read-only for different main threads and worker threads, while the marking bits and the marking work list support both read and write access.

The implementation of the marking work list is crucial for performance, as it can balance between “fully utilizing thread-local variables” and “fully utilizing concurrency”.

The following diagram shows how V8 uses a work list based on segmented marking to support the insertion and deletion of thread-local variables, thus balancing these two extreme scenarios. Once a segment is full, it is released into a shared global pool, where it can be stolen by threads. Through this approach, V8 allows the marking worker threads to run locally for as long as possible without any synchronization.

Incremental Marking #

In addition to concurrent marking, the main garbage collector also utilizes incremental marking, which allows the main thread to handle incremental marking tasks during its idle time. To implement incremental marking, it is necessary to ensure that the previous half of the work has “memory” and to handle the changes to existing objects caused by JavaScript during the process. In this case, V8 applies the three-color marking scheme and write barriers.

The three-color marking scheme works by marking nodes that are referenced from the root as black, gray, and white. Black represents nodes that have been referenced and marked, gray represents nodes that have been referenced but not yet marked, and white represents nodes that have not been referenced. Therefore, when there are no gray nodes, cleanup can be performed, but if there are gray nodes, the marking needs to be resumed and then followed by cleanup.

Since incremental marking is performed intermittently, the data that has been marked may be modified by JavaScript. For example, if a referenced data is reassigned to a new object, it becomes detached from the previous object. However, since the garbage collector has already accessed the old node and will not access the new one, the new node will be recorded as an unreferenced white node. Therefore, a restriction must be imposed in this case, which is achieved through a write barrier.

Summary #

In this class, we learned about how JS engines use idle states to perform garbage collection and the potential performance bottlenecks and lags that can arise from this mechanism when considering program performance. We saw that Chrome and V8 have made many optimizations to address these performance issues by using generational and main/sub garbage collectors. They also employ concurrent and parallel working threads to improve application execution fluency.

Although these issues may not be obvious for general web applications, with the rise of Web 3.0 and the concept of metaverse, as well as the increasing popularity of parallel and concurrent implementation with WebGL+Web Worker, the complexity of front-end object creation and rendering work continues to increase. Therefore, memory management and garbage collection will be an ongoing topic worth paying attention to.

Thinking Questions #

Although we say that JavaScript’s garbage collection is automatic, are there any “manual” methods of optimizing memory when writing code?

Feel free to share your answers, exchange learning experiences, or ask questions in the comments section. If you find it helpful, you are also welcome to share today’s content with more friends. See you in the next class!