Practice Sample to Run Hot Issue Q& a 1st Phase

Practice Sample to Run Hot Issue Q&A 1st Phase #

Hello, I am Sun Pengfei, the “Learning Commissioner” for this column.

Since the launch of this column, many students have provided feedback, saying that they encounter problems when running the sample exercises. Because these samples are mostly implemented in C/C++, compiling and running them can be slightly more complex than traditional Java projects. Today, I will address the questions that students have encountered in volumes 1 to 4, in order to provide some clarification. The purpose of setting up these exercises is to enable you to quickly try out the tools and methods discussed in this column after studying its content, thus helping you to grasp the essence of the technology more efficiently. Therefore, I hope that all students can participate more and provide us with feedback in the comment section if you have any questions. I will also periodically address any further questions related to the exercises.

Configuration of Development Environment #

First, let’s address the frequently asked question regarding the development environment.

Most of the previous exercises’ samples were developed using C/C++, so to run them, you need to first configure the SDK and NDK. Generally, the SDK is already configured, but the NDK configuration has some specific aspects. Our samples usually use the latest version of the NDK, and the code may be written using C++11/14 syntax and compiled using CMake. Here are the NDK configuration options.

First, you need to download the latest version from the NDK official website. After downloading, you can extract it to a suitable location. On macOS, you can place it in the ANDROID_SDK_HOME/ndk_bundle directory, which can be found by Android Studio by default. If you place it in a different directory, you may need to specify it manually.

There are generally two ways to specify the NDK directory.

  1. There is usually a local.properties file in the root directory of the exercise sample. Modify the ndk.dir path in this file.

    ndk.dir=/Users/sample/Library/Android/sdk/ndk-bundle
    sdk.dir=/Users/sample/Library/Android/sdk
    
  2. You can configure it in Android Studio by going to File -> Project Structure -> SDK Location.

    Screenshot

    The two methods above have the same effect.

Some samples may require downgrading the NDK for compilation. In this case, you may need to download an older NDK version from the official website.

Afterwards, you need to install CMake and LLDB.

  • CMake: An external build tool that can be used together with Gradle to build native libraries.

  • LLDB: A debugger that Android Studio uses to debug native code.

You can install both of these from Tools -> Android -> SDK Manager.

Screenshot

That’s it! You have now configured the necessary environment for compilation.

Hot Issues Q&A #

01 | Crash Optimization (Part 1): All about “Crash”

The most common issue encountered by students regarding the sample in the first issue is the problem of being unable to obtain crash logs when running on the simulator.

The reason behind this problem is quite deep. The most obvious cause is that using Clang to compile Breakpad under the x86 platform will cause abnormal execution, resulting in the inability to capture logs. To solve this problem, we need to first understand the compiler integrated with the NDK.

NDK integrates two sets of compilers: GCC and Clang. Starting from NDK r11, the official recommendation is to use Clang. For more details, you can refer to the ChangeLog, where GCC is marked as deprecated. Moreover, GCC has not been updated since the upgrade from GCC 4.8 to 4.9. Starting from NDK r13, Clang is the default compiler. Apparently, starting from NDK r16b, enabling GCC will cause errors, and libc++ is used as the default STL. Furthermore, starting from NDK r18, GCC has been completely removed.

Since Clang’s compilation causes abnormal execution of Breakpad on x86, we need to switch to GCC for compilation. The steps are as follows.

  1. Firstly, switch the NDK to r16b. You can download it from here. Find the NDK version that corresponds to your operating system platform.

  2. Set the NDK path to the path of ndk-16b in Android Studio.

  3. Perform the following configuration in the build.gradle file of the exercise sample and breakpad-build.

    externalNativeBuild { cmake { cppFlags “-std=c++11” arguments “-DANDROID_TOOLCHAIN=gcc” } }

The second issue is how to obtain the log parsing tool.

The parsing of Minidump logs mainly relies on the minidump_stackwalk tool, and the accompanying tool is dump_syms, which can obtain the symbol table of an so file.

These two tools need to be compiled from Breakpad. Some students have found articles that use Chrome team’s depot_tools to download and compile the source code of the tools. Depot_tools is a very useful tool, but its server cannot be accessed in China. Therefore, it is relatively convenient for us to directly download and compile the source code.

There are some points to be aware of when compiling Breakpad. Since the Linux kernel is used in the Android platform, the symbol table export tool dump_syms for dynamic link libraries in Android needs to run on Linux (no cross-compilation solution has been found for other platforms). Therefore, the following steps are performed in a Linux environment (Ubuntu 18.04), as shown below.

  1. Download the source code.

  2. Because the source code does not come with some third-party libraries, building it now will cause an exception. We need to download the lss library to the src/third_party directory in the Breakpad source code.

    git clone https://chromium.googlesource.com/linux-syscall-support

  3. Then, execute the following commands in the source code directory.

    ./configure && make make install

After doing this, we can directly use the minidump_stackwalk and dump_syms tools.

The third issue is how to parse the captured Minidump logs.

The generated crash information will be prioritized to be stored in /sdcard/crashDump if the sdcard permission is granted, which makes it convenient for further analysis. Otherwise, it will be placed in the directory /data/data/com.dodola.breakpad/files/crashDump.

You can use the adb pull command to pull the log files.

adb pull /sdcard/crashDump/

Firstly, we need to extract the symbol table from the dynamic library that produces the crash. Taking the sample from the first issue as an example, the obj path of the dynamic library that produces the crash is located under Chapter01/sample/build/intermediates/cmake/debug/obj.

Here, you need to pay attention to the phone platform. Take out the libcrash-lib.so library for symbol table dumping according to the platform used when running the Sample, and then use the dump_syms tool to obtain the symbol table.

dump_syms libcrash-lib.so > libcrash-lib.so.sym
  1. Create the directory structure for the symbol table. First, open the generated libcrash-lib.so.sym and find the following code.
MODULE Linux arm64 322FCC26DA8ED4D7676BD9A174C299630 libcrash-lib.so

Then create a directory structure like Symbol/libcrash-lib.so/322FCC26DA8ED4D7676BD9A174C299630/ and copy the libcrash-lib.so.sym file into that folder. Note that the directory structure must be correct, otherwise the symbol table will not match correctly.

  1. After completing the above steps, you can now parse the crash log by executing the minidump_stackwalk command.
minidump_stackwalk crash.dmp ./Symbol > dump.txt
  1. Now the crash log that we obtain will have a symbol table, corresponding to the log records before without symbol tables.

  1. If we don’t have the original obj file, we need to analyze it using the exported symbols of libcrash-lib.so. The tool used here is addr2line, located at $NDK_HOME/toolchains/arm-linux-androideabi-4.9/prebuilt/darwin-x86_64/bin/arm-linux-androideabi-addr2line. Pay attention to the platform. If analyzing a 64-bit dynamic library, you need to use addr2line from aarch64-linux-android-4.9 (this is for 64-bit).
aarch64-linux-android-addr2line -f -C -e libcrash-lib.so 0x5f8
Java_com_dodola_breakpad_MainActivity_crash
  1. You can use GDB to debug the problematic dynamic library based on the Minidump. This will not be covered here, but you can refer to this article.

03 | Memory Optimization (Part 1): Memory Optimization in the 4GB Era

Regarding the Sample in this issue, many students inquired about the principles of the Hook frameworks often used in the Sample.

There are two types of Hook frameworks used in the Sample. One is the Inline Hook solution (Substrate and HookZz). The other is the PLT Hook solution (Facebook Hook). These two solutions have their own advantages and disadvantages, and different frameworks should be used depending on the desired functionality.

PLT Hook is generally more stable than Inline Hook, but its operation is limited to dynamic linked functions located in the PLT table, while Inline Hook can hook all the code in the SO. Due to the need to perform instruction repair operations on various platforms, the stability and compatibility of Inline Hook are much worse than PLT Hook.

Regarding PLT Hook, you can refer to the book “Advanced Programming in the UNIX Environment” for more information. To understand Inline Hook, you need to have a deep understanding of ARM, x86 assembly, and the Procedure Call Standard for each platform.

In the third issue, some students also asked about how the function symbols in the Sample were obtained.

To hook a function, we first need to know its address. In Linux, we can use the dlsym function to obtain the address of a function based on its name. The function names in dynamic libraries are generally generated as symbol names using Name Mangling technology (you can refer to this article for specific details). Therefore, the Sample in the third issue includes many converted function names.

void *hookRecordAllocation26 = ndk_dlsym(handle, "_ZN3art2gc20AllocRecordObjectMap16RecordAllocationEPNS_6ThreadEPNS_6ObjPtrINS_6mirror6ObjectEEEj");

void *hookRecordAllocation24 = ndk_dlsym(handle, "_ZN3art2gc20AllocRecordObjectMap16RecordAllocationEPNS_6ThreadEPPNS_6mirror6ObjectEj");

Such functions can be demangled using the c++filt tool. I provide a web-based demangling tool here.

To find the hook points, we need to read the system source code. For example, the Sample in the third issue hooks functions related to memory allocation in the virtual machine. One thing to note is to first confirm whether the symbol for the function exists. In many cases, there may not be a corresponding symbol for forcibly inlined functions or overly small functions. In such cases, you need to use tools like objdump, readelf, or various disassembly tools to check if there are corresponding symbols based on the class name and function name.

Summary #

The focus of the first issue of Breakpad Sample was to demonstrate how to obtain and interpret logs for native crashes. Depending on the nature of our business, we often encounter Java exceptions. As our business stabilizes and our code’s exception handling improves, the number of Java exceptions gradually decreases, while issues with native crashes become more apparent. Generally, larger applications will include some native libraries, such as encryption, mapping, logging, and push modules. Due to various factors, these codes may generate exceptions, and we need to understand crash logs to troubleshoot and resolve them, or to work around these exceptions in order to improve the stability of our applications.

By studying the source code of Breakpad, we can gain insights into signal capturing, the use of ptrace, process fork/clone mechanisms, inter-process communication between main and child processes, stack unwinding, system information retrieval, memory maps information retrieval, symbol dumping, and symbol reverse resolution. Through this source code, we can learn a lot.

The sample provided in the second issue offers a solution for handling system exceptions. It suggests using reflection or proxy mechanisms to handle exceptions in system code. It is important to note that the FinalizerWatchdog mechanism is not a system exception, but a protection mechanism of the system. Many times, we encounter crashes caused by bugs in system frameworks, such as the common Toast exception. Although these exceptions are not caused by our own application, they still affect the user experience. To address these exceptions, we can consider the approach presented in this sample.

The third issue of the sample describes a simple Memory Allocation Trace monitoring module. This module is primarily used in conjunction with an automatic performance analysis system to automatically identify issues, such as monitoring the allocation of large objects and analyzing the call stack of allocated objects. There are many things that this module can do, and you can develop tools suitable for your own business based on this approach.

From the code provided in the third issue of the sample, you can learn about the use of the Inline Hook Substrate framework, using ndk_dlopen to bypass the Android Classloader-Namespace Restriction mechanism, as well as thread synchronization in C++.

Feel free to click “Please Share with Friends” to share today’s content with your friends and invite them to learn together.