14 Deep Understanding of the Call Stack Through Spark Plug

14 Deep Understanding of the Call Stack Through SparkPlug #

Hello, I’m Ishikawa.

In Lesson 11, we learned about stacks and heaps, two types of data structures, through closures in functions. In Lesson 12, we understood how functions loop in the call stack through recursion. Today, we will delve into the concepts of stack frame, stack pointer, and frame pointer in the call stack of the JavaScript engine with V8’s Sparkplug compiler.

Image

For those who may not be familiar with V8, let’s take a brief look at the history of Sparkplug. Initially, V8 used a relatively fast Full-codegen compiler to generate unoptimized code and then used the Crankshaft just-in-time (JIT) compiler for code optimization and deoptimization. However, as more people shifted their browsing habits from PCs to mobile devices, performance became more important. This pipeline was incapable of achieving ideal optimization for both pre-ES6 and post-ES6 versions of JavaScript.

Image

In response to this, V8 introduced the Ignition interpreter and the TurboFan optimization compiler, with a focus on mobile-first pipeline. Its most notable feature is the generation of bytecode through Ignition, followed by the generation of optimized code and related deoptimization through TurboFan.

Image

Yet, there were shortcomings of TurboFan compared to Crankshaft. To address this issue, V8 created a “toolbox” that integrates Full-codegen, Crankshaft, Ignition, and TurboFan. This evidently made the problem more complex.

Image

Since 2016, the V8 team has shifted their focus from theory to practice, substituting real-world tests for self-developed Octane test engine. They have also sought performance optimization in areas such as parsers, streams, object models, garbage collectors, and compiled code caches, rather than just optimizing the compiler. However, during this process, they encountered limitations such as the costs of bytecode decoding or dispatching when optimizing the interpreter. Given the model of V8’s dual compiler, faster layered code optimization cannot be achieved. To achieve higher speed, some optimization checkpoints had to be removed, but this would reduce peak performance. It is not possible to start optimizing in the initial phase of execution without stable feedback on object hidden classes. Sparkplug was born out of the aforementioned reasons and issues. As a non-optimizing compiler, it exists between the Ignition interpreter and the TurboFan optimization compiler.

Image

Why is there an additional compiler #

Since the purpose of Sparkplug is fast compilation, it uses two important methods to achieve this goal.

First, the functions it compiles have already been compiled into bytecode. The bytecode compiler has already done most of the complex work, such as variable resolution, checking if parentheses are arrow functions, removing syntactic sugar and destructuring statements, etc. Sparkplug’s feature is that it compiles from bytecode, not from JavaScript source code, so there is no need to worry about these issues. This is also the difference between Sparkplug and Full-codegen. In the Full-codegen scenario, Crankshaft needs to reparse the source code into an AST syntax tree for compilation; and to deoptimize to Full-codegen, it requires repeating Full-codegen compilation to handle stack frames.

The second method is that Sparkplug does not generate any intermediate code (IR) like TurboFan does. What is IR? It is a layered structure from abstract to concrete. In the concrete layer, it can generate special machine code for specific content such as the new ES standard or specific machines such as IBM, ARM, or Intel. TurboFan uses an IR based on a node sea idea. After receiving instructions from Ignition, TurboFan optimizes and generates machine code specific to the platform. In contrast, Sparkplug compiles directly into machine code during a single linear pass on bytecode, producing code that matches the execution of that bytecode. So, in fact, the entire Sparkplug compiler is a switch statement within a for loop, allocating the generation function for machine code based on each fixed bytecode.

// Partial code of the Sparkplug compiler
for (; !iterator.done(); iterator.Advance()) {
  VisitSingleBytecode();
}

The absence of IR means limited optimization opportunities for Sparkplug, except for very limited peephole optimizations. This means that because it lacks an architecture-independent intermediate stage, the entire implementation must be ported separately to each supported architecture. However, in reality, these are not issues because Sparkplug is a simple and fast compiler, so the code is easy to port; and because TurboFan is still present in the workflow, extensive optimization is not necessary. And we can see that TurboFan’s deoptimization falls back to SparkPlug, not Ignition.

Usage of Stack Pointers and Frame Pointers #

Now let’s take a look at the principles of stack frames and the usage of stack pointers and frame pointers. Adding a new compiler to a mature JavaScript virtual machine is actually quite difficult. Apart from the functionalities of Sparkplug itself, it also needs to support tasks such as debuggers, CPU analyzers that traverse the stack, stack exception tracking, layered integration, and on-stack replacement (OSR) for hot loops to optimize code.

However, Sparkplug has used a clever method to simplify most of these issues, which is maintaining “interoperable stack frames with the interpreter”. We know that a call stack is a way of storing function states during code execution, and whenever we call a new function, it creates a new stack frame for the local variables of that function. Stack frames are defined by the frame pointer (marking its beginning) and stack pointer (marking its end).

Image

When a function is called, the return address is also pushed onto the stack. The return address is popped by the function upon returning, so that it knows where to return. When the function creates a new stack frame, it also saves the old frame pointer on the stack and sets the new frame pointer to the beginning of its own stack frame. Therefore, the stack has a series of frame pointers, each one marking the start of the previous stack frame.

Image

In addition to the local variables and return addresses of functions, the stack also contains argument passing and value storage. The arguments (including the receiver) are pushed onto the stack in reverse order before calling the function, and the several stack slots before the frame pointer are the current function being called, the context, and the number of arguments passed. This is the “standard” JS frame layout:

Image

In order to traverse the stack with minimal cost during performance analysis, this JS calling convention is shared between optimized and interpreted stack frames. The Ignition interpreter further clarifies the calling conventions. Ignition is a register-based interpreter, different from machine registers, it is a virtual register. Its purpose is to store the current state of the interpreter, including JavaScript function local variables (var/let/const declarations) and temporary values. These registers are stored in the interpreter’s stack frame. In addition, the stack frame also contains a pointer to the executing bytecode array and the offset of the current bytecode in that array.

Later, the V8 team made a small change to the interpreter’s stack frame. Sparkplug no longer retains the latest bytecode offset during code execution, but instead stores a bidirectional mapping from Sparkplug code address range to the corresponding bytecode offset. Since Sparkplug code is emitted directly from linearly traversing the bytecode, this is a relatively simple encoding mapping. Whenever a stack frame wants to know the “bytecode offset” of the Sparkplug stack frame, it looks up the current instruction in the mapping and returns the corresponding bytecode offset. Similarly, whenever it wants to perform on-stack replacement (OSR) from the interpreter to Sparkplug, it can look up the current bytecode offset in the mapping and jump to the corresponding Sparkplug instruction.

Image

Sparkplug deliberately creates and maintains a stack frame layout matching the interpreter; whenever the interpreter stores a register value, Sparkplug also stores one. This has several benefits: first, it simplifies Sparkplug’s compilation because Sparkplug can mirror the behavior of the interpreter without maintaining a mapping from interpreter registers to Sparkplug states. Second, it speeds up compilation because the bytecode compiler has already done the tedious work of register allocation. Third, it integrates well with the rest of the system, such as the debugger and analyzer. Fourth, any logic applicable to on-stack replacement (OSR) in the interpreter also applies to Sparkplug; and the cost of frame transformation between the interpreter and Sparkplug code is almost zero.

The previously empty space for bytecode offset forms an unused slot on the stack frame, which is redefined to cache the “feedback vector” of the currently executing function, storing data of object structures that need to be loaded in most operations. Therefore, the Sparkplug stack frame eventually looks like this:

Image

The actual code for Sparkplug is small, mainly handling built-in module calls and control flow. Why is this the case? Because JavaScript semantics are complex, even the simplest operations require a large amount of code. Forcing Sparkplug to inline and regenerate code at each compilation would significantly increase compilation time and memory consumption of Sparkplug code. V8 would also have to reimplement code generation for a bunch of JavaScript features for Sparkplug, which could introduce more errors and greater security vulnerabilities. Therefore, most of the Sparkplug code calls “built-in modules”, small sections of machine code embedded in the binary file, to do the actual heavy lifting. These built-in functions are basically the same as those used by the interpreter, or at least share most of them.

At this point, you may wonder what is the purpose of Sparkplug, as it seems to be doing almost the same work as the interpreter. In many ways, Sparkplug is indeed just a serialization of the interpreter’s execution, calling the same built-in functions and maintaining the same stack frames. However, even so, it is valuable because it eliminates or, more precisely, precompiles the interpreter costs that cannot be eliminated, as in fact, the interpreter affects many CPU optimizations.

For example, operator decoding and dispatching to the next bytecode. The interpreter dynamically reads static operators from memory, causing the CPU to either stall or speculate what the value might be. Dispatching to the next bytecode requires successful branch prediction to maintain performance. Even if the speculation and prediction are correct, all decoding and dispatching code still need to be executed, resulting in the depletion of precious space and cache in various buffers. Although the CPU is used for machine code, it is itself essentially an interpreter. From this perspective, Sparkplug is actually a “converter” from Ignition to CPU bytecode, moving functions from the “simulator” to “native” execution.

Summary #

Through today’s explanation of Sparkplug, we have gained a deeper understanding of the concepts of stack frame, stack pointer, and frame pointer on top of our previous knowledge of the stack data structure. We have also gained more insights into the JS compilation pipeline. Interested students can also check out the links to related articles on the V8 blog that I provided in the references. Additionally, this website offers explanations on various languages and compiler principles, allowing us to witness V8’s relentless pursuit of performance optimization, which is worth studying. If you are interested in compiler principles, you can also take a look at the compiler principles course taught by Professor Gong Wenxue in the adjacent class.

Thought Questions #

In the previous section, we mentioned that Sparkplug is not completely without optimization and it also performs peephole optimization. Do you know the principle behind this optimization?

Looking forward to seeing your sharing in the comments section, let’s communicate and discuss together. Also, feel free to share today’s content with more friends. See you next time!

Reference #