Native How to Get the Call Stack

Native How to Get the Call Stack #

Hello, I’m simsun. I used to work on Android development at WeChat, and I’m also an open source enthusiast and a fan of the Rust programming language. Invited by Shao Wen, I’m glad to share some low-level knowledge about compilation in the “Master Class” with you.

When debugging Native crashes or doing profiling, we heavily rely on backtraces. A high-quality backtrace can greatly reduce the time it takes to fix a crash. But do you know how the system generates a backtrace? Today, let’s explore the story behind the backtrace.

Here is a common Native crash. Usually, the crash itself does not contain any backtrace information. The only thing we can get directly is the value of the current registers, but clearly, the backtrace is the key that can help us fix the bug.

pid: 4637, tid: 4637, name: crasher  >>> crasher <<<
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
Abort message: 'some_file.c:123: some_function: assertion "false" failed'
    r0  00000000  r1  0000121d  r2  00000006  r3  00000008
    r4  0000121d  r5  0000121d  r6  ffb44a1c  r7  0000010c
    r8  00000000  r9  00000000  r10 00000000  r11 00000000
    ip  ffb44c20  sp  ffb44a08  lr  eace2b0b  pc  eace2b16
backtrace:
    #00 pc 0001cb16  /system/lib/libc.so (abort+57)
    #01 pc 0001cd8f  /system/lib/libc.so (__assert2+22)
    #02 pc 00001531  /system/bin/crasher (do_action+764)
    #03 pc 00002301  /system/bin/crasher (main+68)
    #04 pc 0008a809  /system/lib/libc.so (__libc_init+48)
    #05 pc 00001097  /system/bin/crasher (_start_main+38)

Before reading the rest of the content, take 2 minutes to think about how the system generates a backtrace. We usually refer to the process of generating a backtrace as “unwind”. Unwind may seem unrelated to our usual development, but many functionalities actually rely on it. For example, if you want to draw a flame graph or obtain a backtrace when a crash occurs, you need to rely on unwind.

The “unwind” in books #

1. Function Call Process

If you have taken a course in assembly language in college, I believe you will still have an impression of the content below. The figure below shows a very standard process of function call.

First, assuming that we are in the main() function and ready to call the foo() function, the calling side will push the arguments in reverse order. At this time, the first argument will be at the top of the call stack.
The invoke foo() pseudo-instruction is called, pushing the value of the current register EIP into the stack, and then loading the address of the foo() function into EIP.
At this point, since we have changed the value of EIP (to the address of foo()), it is equivalent to entering the foo() function. Before executing a function, the compiler writes a prologue for each function, which pushes the old EBP value and assigns new values to the current EBP and ESP, thereby forming a new function stack.
The next step is to execute the code of the foo() function itself.
Finish executing the foo() function and prepare to return. Here, the compiler also inserts an epilogue for each function to restore the ESP and EBP of the caller to rebuild the stack of the previous function and restore registers.
Execute the return instruction (ret). The epilogue of the called function has restored EBP and ESP, and then we can pop EIP, all arguments, and the values of the saved registers from the restored stack one by one.

Reading to this point, I believe that if you haven’t studied assembly language, you will definitely have some confusion. Let me explain the specific meanings of the register abbreviations mentioned above. The names used above all use the x86 naming convention. I mentioned these to give you a preliminary understanding of function calls. There are many details that vary in different architectures and compilers, so please relax and continue reading with me.

EBP: Base Pointer Register, points to the bottom of the stack frame. - In the ARM architecture, R11 (ARM code) or R7 (Thumb code) serves a similar purpose. In ARM64, this register is X29. - ESP: Stack Pointer Register, points to the top of the stack frame. In ARM, the register is R13. - EIP: Instruction Pointer Register, stores the address of the next instruction to be executed by the CPU. In ARM, it is PC, and the register is R15.

2. Restoring the Calling Frame

If we shrink the above process and look at it from a higher-level perspective, all function call stacks will form a call frame, and each frame contains enough information to restore the stack frame of the calling function.

Here, we ignore other irrelevant details and focus on EBP, ESP, and EIP. You can see that EBP and ESP respectively point to the bottom and top of the stack of the executing function. Every function call saves EBP and EIP to restore the function stack frame when returning. Here, all the saved EBP values are like a linked list pointer, constantly pointing to the EBP of the calling function. In this way, we can elegantly restore the entire call stack using this as a clue.

Here, we can use the following pseudocode to restore the call stack:

void debugger::print_backtrace() {
    auto curr_func = get_func_from_pc(get_pc());
    output_frame(curr_func);

    auto frame_pointer = get_register_value(m_pid, reg::rbp);
    auto return_address = read_mem(frame_pointer+8);

    while (dwarf::at_name(curr_func) != "main") {
        curr_func = get_func_from_pc(ret_addr);
        output_frame(curr_func);
        frame_pointer = read_mem(frame_pointer);
        return_address = read_mem(frame_pointer+8);
    }
}

However, in the ARM architecture, for performance reasons, ingenious developers, in order to save the R7/R11 register, make it available as a general-purpose register, so it cannot guarantee that enough information is saved to form the above call stack (even if you pass “-fno-omit-frame-pointer” to the compiler). For example, in the following two cases, ARM does not follow the general meaning of the prologue, and interested students can check the APCS Doc for specific information.

The function is a leaf function, which means that there are no function calls within the function body.
The function body is very small.

Unwinding in Android #

We know that most Android phones use the ARM architecture. So how do we perform unwinding in Android? Let’s discuss two different scenarios.

1. Unwinding in Debug Build

In the case of a debug build, we can use “.debug_frame” (students who are interested can learn more about DWARF) to help us with unwinding. This method is highly efficient and accurate. However, the drawback is that debugging information itself is large and can even be larger than the program segment (.TEXT segment). Therefore, we cannot include this information in the release build.

DWARF is a standard debugging information format. DWARF was initially designed along with the ELF file format, but DWARF itself is an independent object file format. DWARF and ELF names do not have any meaning (dwarves, elves, are they like characters in World of Warcraft :)), but for convenience of promotion, it was later named Debugging With Attributed Record Formats. Quoted from wiki

2. Unwinding in Release Build

For release builds, the system uses a similar section to “.debug_frame”, called an unwind table. Specifically, on x86 and ARM64 platforms, it is called “.eh_frame” and “.eh_frame_hdr”. On ARM32 platforms, it is called “.ARM.extab” and “.ARM.exidx”.

Since ARM32’s standard predates the DWARF method, ARM used its own implementation. However, their principles are very similar. In the following discussion, we will only focus on “.eh_frame”. If you are particularly interested in ARM32’s implementation, you can refer to the ARM-Unwinding-Tutorial.

The “.eh_frame” section also follows the format of DWARF. However, DWARF itself is extremely detailed and complex, so we won’t go deep into it. We will only touch on some relatively simple content. You just need to understand that DWARF uses a data structure called DI (Debugging Information Entry) to represent the information needed for debugging variables, variable types, functions, etc.

“.eh_frame” uses a clever method to form a very large table. This table includes the values of the corresponding registers and return addresses for each program segment, among other relevant information. Here is an example of this table (you can use readelf --debug-dump=frames-interp to view the corresponding information. Some information will be simplified in the release build, but all the register information that helps us unwind will be preserved).

The “.eh_frame” section contains at least one CFI (Call Frame Information). Each CFI consists of two entries: an independent CIE (Common Information Entry) and at least one FDE (Frame Description Entry). Generally, each CFI corresponds to an object file, and each FDE describes a function.

The “.eh_frame_hdr section” contains a series of attributes, including some basic meta information and an ordered list (initial address, pointing to the FDE pointers in “.eh_frame”). These pieces of information are sorted based on function, thus enabling accelerated searching using binary search.

Summary #

In general, unwinding the first stack frame is the most difficult, because ARM cannot guarantee that the base pointer register (EBP) will be saved on the stack. Therefore, we need to rely on some additional information (.eh_frame) to help us obtain the value of the corresponding base pointer register. Even so, there will still be various stack corruption issues in production environments, so there is still a lot of work to be done. Different debuggers (GDB, LLDB) or breakpad have implemented some search algorithms to find potential stack frames, but we will not discuss them in detail here. Interested students can refer to relevant code for more information.