18 How Turbo Fan Optimizes Js Compilation With Graphs

18 How TurboFan Optimizes JS Compilation with Graphs #

Hello, I am Shikawa.

Today let’s take a look at graphs in JavaScript (JS) compiler optimization. A graph is an abstract network structure model. You may wonder what algorithms would use such a structure. In fact, many applications such as social media platforms like Weibo, WeChat, Facebook, and LinkedIn use graphs to represent social networks. Additionally, graphs can also be used to represent real-world road networks, grid systems, and virtual communication networks.

Graphs play a crucial role in the compiler of the JS engine. It is not an exaggeration to say that the entire TurboFan compiler in V8 is based on graphs. By using this data structure, we can gain a deeper understanding of compilation principles and simultaneously develop a more in-depth knowledge of relevant data structures and algorithms.

Structure of a Graph #

First, let’s take a look at the structure of a graph. A graph consists of nodes (vertices) connected by edges. Any binary relationship can be represented by a graph.

Now, how does TurboFan use graphs for compilation? Before we dive into that, let’s first understand the compilation process and intermediate code.

Compilation Process: Intermediate Code #

IR, which stands for Intermediate Representation, is sometimes also called Intermediate Code (IC). Conceptually, IR can be divided into HIR (Higher IR), MIR (Middle IR), and LIR (Lower IR), representing different forms of intermediate code at high, middle, and low levels.

From the figure above, we can see that the entire compilation process can be divided into three stages: front end, middle end, and back end. The front end mainly performs lexical, syntax, and semantic analysis, which we have discussed in previous chapters. The middle end performs related optimization tasks, and finally, the back end generates the target code.

As mentioned earlier, intermediate code can be divided into three different abstraction levels: high, middle, and low. HIR is mainly used for analysis and transformations based on the source language, and a typical example is the Abstract Syntax Parser (AST) used for semantic analysis. MIR is independent of both the source language and the CPU architecture, and it is used for analysis and optimization of algorithms, with examples such as Three Address Code (TAC) and Program Dependency Graph (PDG). LIR depends on the CPU architecture for optimization and code generation, and a typical example is Directed Acyclic Graph (DAG), which aims to assist in generating target code.

In the analysis stage of the compilation process, an important analysis is Control Flow Analysis (CFA) and Data Flow Analysis (DFA). The Sea of Nodes (SoN) we will discuss later is a data structure that can simultaneously reflect data flow and control flow. The SoN can reduce the mutual dependencies between CFA and DFA, thus facilitating global optimization.

In the optimization stage of the compilation process, there are several important concepts: general optimization, object optimization, and functional optimization. For example, loop optimization belongs to general optimization, while inlining and escape analysis belong to object optimization. As for functional programming, an important optimization is loop and tail recursion optimization, which we mentioned when discussing iteration and recursion.

In the final stage of the process, generating target code, the key tasks are register allocation and instruction selection and scheduling.

Today, we will focus on the principles of graphs and the Sea of Nodes during the analysis process, and why the Sea of Nodes is beneficial for global optimization.

Mid-End Optimization: Analysis and Optimization #

Let’s start with the optimization of the mid-end. From the structure of the intermediate representation (IR), you might notice that “Intermediate Representations” may be a more accurate description than “Intermediate Code”. This is because IR is more like a “data structure” in the form of linear lists, trees, graphs, etc., rather than “code”. Today, we will focus on one such data structure, the Sea of Nodes (SoN), used by TurboFan.

Performance has always been a major concern for V8. The concept of Sea of Nodes (SoN) was first mentioned in a paper by Cliff Click, the founder of HotSpot, in 1993. V8’s previous generation compiler, CrankShaft, was based on HotSpot C1, while the current TurboFan compiler is an upgraded version based on the second generation of HotSpot C2. TurboFan uses optimized SoN IR combined with pipeline-generated machine code, resulting in higher code quality compared to machine code generated by the CrankShaft JIT. TurboFan incorporates more optimizations and analysis, such as control flow optimizations and precise range analysis, which were not possible before. The introduction of the Sea of Nodes enhances global optimization capabilities. Now, let’s explore the structure and format of the Sea of Nodes.

SoN Structure: Format and Representation #

First, let’s take a look at the structure of SoN. The SoN consists of nodes and edges. All operations, including computations, are represented using nodes, except for values. The TurboFan visualization tool, Turbolizer, uses different colors to distinguish different types of nodes. Light blue represents values, while dark blue represents intermediate-level operations, such as computations. The edge represents the dependency between operations. For example, in the addition operation shown below, the edge represents the dependency between the addition operation and the values 5 and x. This dependency restricts the order of operations, allowing 5 and x to be in any order as long as they appear before the addition operation.

SoN Format: Static Single Assignment (SSA)

Now that we’ve discussed the structure, let’s look at the format of SoN. SoN follows the format of Static Single Assignment (SSA). In SSA, each variable has a unique assignment. For example, in the example below, the first operation assigns the value of 3 multiplied by 7 to the variable x, and then 3 is added to x. In this case, the variable x appears twice. Therefore, TurboFan renames the local variable during graph creation. After the renaming, the value 3 has two operators pointing to it: one representing the result of the multiplication by 7, and the other representing the result plus 3. In SoN, there is no concept of local variables, so variables are replaced by nodes.

Data Flow Representation #

As mentioned earlier, an important analysis process involves Control Flow Analysis (CFA) and Data Flow Analysis (DFA). In the data flow, nodes represent all operations, including constants, parameters, mathematical operations, loads, stores, function calls, etc.

When it comes to edges in the data flow, there are two key concepts. First, solid edges indicate the direction of data flow, determining the relationship between inputs and outputs. In the example below, the solid edge represents the output of the operation. Second, dashed edges affect the read/write status of related data in the data flow. In the example below, the dashed edge represents the specified read/write status of the operation.

To implement Write-After-Write (WAW), Write-After-Read (WAR), and Read-After-Write (RAW) read/write statuses, let’s take a look at how they are represented. In SoN, dashed edges can be used to express the effects of different read/write statuses. For example, the value of the “val” property of the “obj” object below is mutable. After the addition of “val” and 3, the result is stored by first reading and then writing it.

Control Flow Representation #

We mentioned earlier that from the perspective of data flow, nodes and edges express everything. The same holds true for control flow. Through the following examples, we can see that control flow nodes, including start, branch, loop, merge, and end, are all represented by nodes.

In the visualization graph of Turbolizer, yellow nodes represent control flow nodes. Below are several examples that focus solely on data flow. We can see that from the simplest straight-line program consisting of only a start and end, to branches and loops, all can be represented by a sea of nodes.

Next, let’s take a look at the simplest example of combining control flow with data flow. Here, we can see that to distinguish different types of edges, solid black lines, solid gray lines, and dashed lines represent control flow, data flow, and influence relations, respectively.

Finally, when we add merge, we can see a more complete example. Here, you will see a phi instruction. What is the purpose of this instruction? It determines the value of x based on the actual situation after the branch. As we mentioned before, in the rules of Static Single Assignment (SSA), variables can only be assigned once. Therefore, we must use phi at the merge after encountering a branch.

Analysis: Variable Type and Scope #

We mentioned that the important role of intermediate code is analysis and optimization. Why do we need to analyze variable types and scopes? There are mainly three problems to be solved here. First, we said that JavaScript is dynamically typed rather than statically typed, so although values have fixed data types, variables do not have fixed data types. For example, we can say var a = 25, where 25 itself is an immutable number type, but if we say a + “years old”, then the type of variable a becomes a string. Second, all mathematical operations are based on 64 bits, but in actual applications, most programs do not use so many bits. Most of them are less than 32 bits, and there are also special values such as NaN, Infinity, -Infinity, and -0.0. Third, this is to support comments in JS subsets such as asm.js, as well as the need to truncate special values such as NaN and Infinity into integer 0.

Optimization: Node-based Optimization #

After discussing the above three points, we need to analyze variable types and scopes. Let’s talk about optimization. Almost all optimizations are performed on nodes. If a node can be reached, it means it is live, and if a node cannot be reached from the end node, it is a dead node, including dead controls, effects, and calculations. Most stages of the compilation process cannot see dead nodes, and invalid nodes will not be included in the final schedule.

There are many optimization algorithms, and below we will discuss several core optimization strategies.

Strategy One: Constant Hoisting and Simplified Calculation #

The most basic optimization strategy is constant hoisting and simplified calculation.

In constant hoisting, a classic optimization is constant folding, which means adding constants together. For example, in the first example below, we directly fold the operation 3 + 5 into 8.

In simplified calculation, a representative example is strength reduction. For example, in the second example below, there is no difference between x + 0 and x, so it can be simplified to x. Another example is shown in the third example below, where x * 4 can be changed to a shift operation like x « 2.

Solution 2: Deduplication Calculation #

In deduplication calculation, the most common approach is value numbering. This means that the system assigns the same identifier to values that are equal. For example, when the result values of operations such as sin, addition, and LoadField are the same, only one record is needed.

Solution 3: Control Flow Optimization #

There are also many different optimization methods for control flow. Here are a few examples, including branch folding, merge reduction, and control reduction. Through these examples, we can see that both the flow itself and the branches and merges in the flow have been simplified after optimization.

Lowering: Levels of Language #

As we have seen before, TurboFan’s Tubolizer uses different colors to represent different types of nodes. This includes light blue nodes representing values, yellow nodes representing control flow, and dark blue language nodes representing operations. In addition, there are two other types of language nodes, one is the higher-level red JavaScript language nodes, and the other is the lower-level machine language green nodes. In general, the higher the level, the more readable it is to humans; the lower the level, the closer it is to machine language.

We call this process of going from high to low levels “lowering”. Therefore, the process of optimization is also a process of constantly lowering.

Backend Optimization: Instructions and Registers #

That’s all for the middle-end optimization. Now, let’s take a look at the backend optimization, which involves generating target code and includes register allocation and instruction selection.

Instruction Ordering #

Let’s talk about instruction ordering first. The results of the SoN Node Sea will be placed in a Control Flow Graph (CFG). There are three concepts here: control dominance, reducing register pressure, and loop optimization.

First, let’s look at the first point. In this process, fixed nodes such as phi and parameters are placed in the CFG.

Then, the dominance tree is used to calculate the dominance relationships, and the remaining nodes are output from the SoN Node Sea to the CFG according to the dominance relationships.

The second point, how to reduce register pressure during compilation? The previously mentioned SSA is also implemented through the dominance tree. After ordering, the Node Sea becomes a Control Flow Graph.

Finally, let’s look at the third point, loop optimization. In this process, the code in the loop is optimized as much as possible. This process is called Loop Invariant Code Motion, also known as hoisting. You may ask, where does this process take place? In fact, this step has already been done during analysis and optimization, and here, it is just included in the optimized results.

Instruction Selection #

Next, let’s take a look at instruction selection. Theoretically, the largest munch can be used for instruction selection.

But in practice, the order of instruction selection is the opposite of the order in the Control Flow Graph, that is, moving the target positions from the bottom up in the basic blocks.

Register Allocation #

As mentioned earlier, after the analysis and optimization, in the backend of the compilation process, the main task is register allocation. TurboFan uses a linear scan allocator and real-time range splitting to allocate registers and insert spill code. The SSA form can be deconstructed before or after register allocation through explicit moves.

The result of register allocation is the use of real registers instead of virtual registers and the insertion of spill code between instructions.

Summary #

This lecture can be considered more hardcore, but if you can read up to this point, it proves that you have persisted. From the application of V8 to the sea of nodes, I hope that you not only understand this data structure, but also have a deeper understanding of the entire compilation process. Moreover, in this process, we can also see that the code we write may not be the optimized code in the compilation process. This not only shows that when writing code, we can pay more attention to readability, because we don’t need to worry about content that the machine will optimize. From another perspective, some developers have extreme pursuit of performance and may write code that is closer to optimized code, but doing so will sacrifice some readability.

Reflection Questions #

If you have used Graal in Java, you may have noticed that the compiler uses a data structure based on SoN. Can you point out the similarities and differences between HotPot, Graal, and TurboFan on SoN?

Looking forward to your sharing in the comments section, let’s exchange and discuss together. Also, feel free to share today’s content with more friends. See you next time!

Reference #

TurboFan JIT Design by Ben L. Titzer from Google Munich