Practice Sample to Run Hot Issue Q& a 2nd Phase

Practice Sample to Run Hot Issue Q&A 2nd Phase #

Hello, I’m Sun Pengfei. Today, I’ll answer your questions regarding the exercise sample and hot questions based on the Issue 5 of this column. If you have any questions from the previous issue, you can click here to view them.

In order to allow students to practice more, the exercise sample in Issue 5 is designed for you to implement some of the features yourself. The goal is to make sure you can put the principles discussed in the column into practice.

In previous issues, some students have already submitted their exercise assignments through pull requests. I want to give a thumbs up to every student who participated in the exercises and submitted their assignments.

In Issue 5, the assignment is to develop a CPU data collection tool based on system source code. At the end of the issue, we provided an example for you to analyze. I have already submitted the implementation of the example to GitHub, and you can refer to it.

In the article, it is mentioned that “when an ANR occurs, the Android system will print CPU-related information to the log, using ProcessCpuTracker.java”. The implementation of ProcessCpuTracker mainly relies on the /proc pseudo-file system in Linux, and it mainly uses the files /proc/stat, /proc/loadavg, /proc/[pid]/stat, and /proc/[pid]/task to read data. In Linux, many programs depend on the data under /proc, such as top, netstat, ifconfig, etc. In Android, commonly used tools like procrank, librank, and procmem also rely on this data source. The structure of the /proc directory is well documented in the Linux Man Pages, and there are related Chinese explanations in the book “Linux/Unix System Programming Manual”.

There are some points about /proc that need to be clarified. In different Linux kernels, the contents of this directory may vary, so if you want to use the data under this directory, you may need to handle compatibility issues with different versions. Also, due to the fast pace of Linux kernel updates, the documentation may not have caught up yet, leading to inconsistencies between the data and the documentation, especially for a large amount of numeric data separated by spaces. These files are not actually real files, and if you use ls to view them, you will find that their size is 0. They are virtual files created by the system, and reading these files does not involve a series of file system operations. It only incurs a small performance overhead. At the current stage, there is no callback for monitoring file modifications, similar to file system, so data collection needs to be done through polling.

Now let’s take a look at the case study at the end of the column article. The following is the log data for this sample. I will analyze the data to make a guess about the possible cause and reproduce this scenario with code.

usage: CPU usage 5000ms(from 23:23:33.000 to 23:23:38.000):
System TOTAL: 2.1% user + 16% kernel + 9.2% iowait + 0.2% irq + 0.1% softirq + 72% idle
CPU Core: 8
Load Average: 8.74 / 7.74 / 7.36

Process:com.sample.app 
  50% 23468/com.sample.app(S): 11% user + 38% kernel faults:4965
Threads:
  43% 23493/singleThread(R): 6.5% user + 36% kernel faults: 3094
  3.2% 23485/RenderThread(S): 2.1% user + 1% kernel faults: 329
  0.3% 23468/.sample.app(S): 0.3% user + 0% kernel faults: 6
  0.3% 23479/HeapTaskDaemon(S): 0.3% user + 0% kernel faults: 982
  ...

The example above shows the CPU usage over a 5-second period. From this log, we can gather several important pieces of information:

  1. In the “System Total” section, the user CPU usage is low, CPU idle time is high, and most of the consumption is in the kernel and iowait.

  2. The CPU has 8 cores, and the Load Average is also around 8, indicating that the CPU is not under heavy load.

  3. The log shows the CPU usage of the sample app during this time period: low user CPU usage, high kernel CPU usage, and 4965 page faults.

  4. The log also displays the CPU usage of each thread. Currently, only the “singleThread” is in the “R” (Running) state, and it has generated 3096 page faults. Other threads, including the main thread (as seen in the sample log), are in the “S” (Sleeping) state.

Based on the mapping of thread states in the kernel’s macro names and their abbreviations, the “R” value represents the thread being in the “Running” or “Runnable” state. A running state indicates that the thread is currently being executed by a core, while a runnable state indicates that the thread is waiting in a queue for a core to become available for execution. From the kernel’s perspective, there is no difference between these two states as the threads will continue to execute. The other threads in the log are in the “S” (Sleeping) state, which represents TASK_INTERRUPTIBLE. This state occurs when a thread voluntarily yields the CPU, such as when it calls sleep or other situations that result in a voluntary context switch. Common reasons for entering the sleeping state include waiting for a relatively long I/O operation or an IPC operation. If the data a particular I/O operation needs is not in the buffer cache or page cache and needs to be read from a slower storage device, the system will suspend the thread and put it into a queue, waiting for the I/O to complete. After the I/O operation is complete, an interrupt is generated, and the thread returns to the scheduling sequence. However, based on this log alone, it is not possible to determine the specific cause of the sleeping state.

Additionally, the “singleThread” shows relatively high values for various metrics and has generated some faults. Page faults can be categorized into three types: minor page faults, major page faults, and invalid page faults. Let’s analyze them further.

Minor page faults occur when the kernel uses a lazy approach to allocate memory. When memory is requested, the kernel does not immediately allocate physical memory. Instead, it waits until the memory page is used or data is written to it. When this happens, the kernel receives a page fault from the MMU (Memory Management Unit). At this point, the kernel performs the physical memory allocation operation, and the MMU maps the virtual address to a physical address. This type of page fault is considered a minor page fault.

Major page faults occur when the accessed memory is not present in the virtual address space or physical memory and needs to be loaded from a slow device or from the swap partition. It’s worth noting that if the system does not support zRAM to act as a swap device, it can be assumed that Android does not have a swap partition. In Android, major page faults due to accessing the swap partition do not occur. Another case where major page faults can occur is when a file is mapped using mmap, and the virtual memory region, file disk address, and physical memory are all mapped together. When accessing the file data through this address, if the file data is not present in memory, a major page fault error occurs. According to the scenarios in which page faults occur, virtual pages can have four states:

  • The first state is unallocated.
  • The second state is allocated but not mapped to physical memory.
  • The third state is allocated and mapped to physical memory.
  • The fourth state is allocated and mapped to the swap partition (this situation is rare in Android).

Based on the above explanation and combined with the page fault data, we can see that a total of 3094 page faults occurred in the SingleThread, and based on a page size of 4KB, we can infer that approximately 12MB of space was allocated during this process.

Next, let’s analyze the iowait data. Since there is a significant proportion of iowait, it indicates that there were I/O operations performed within the 5-second interval, and the iowait proportion is relatively large, suggesting that there were either a large number of I/O operations or that the I/O operations were slow due to other reasons.

Based on the above analysis, we can make an educated guess about the specific implementation, and it is possible for page faults to occur during both read and write operations. Due to the relatively low performance of my phone, this can be easily reproduced. Therefore, the following code is based on write operations.

File f = new File(getFilesDir(), "aee.txt");

FileOutputStream fos = new FileOutputStream(f);

byte[] data = new byte[1024 * 4 * 3000];// Allocate a byte array of size 12MB

for (int i = 0; i < 30; i++) {// Due to the IO cache mechanism, write multiple times to cache, triggering dirty writeback to the disk
    Arrays.fill(data, (byte) i);// When it reaches this point, a minor fault occurs, and User cpu usage is generated
    fos.write(data);
}
fos.flush();
fos.close();

The CPU data captured for the above code is as follows.

E/ProcessCpuTracker: CPU usage from 5187ms to 121ms ago (2018-12-28 08:28:27.186 to 2018-12-28 08:28:32.252):
    40% 24155/com.sample.processtracker(R): 14% user + 26% kernel / faults: 5286 minor
    thread stats:
    35% 24184/SingleThread(S): 11% user + 24% kernel / faults: 3055 minor
    2.1% 24174/RenderThread(S): 1.3% user + 0.7% kernel / faults: 384 minor
    1.5% 24155/.processtracker(R): 1.1% user + 0.3% kernel / faults: 95 minor
    0.1% 24166/HeapTaskDaemon(S): 0.1% user + 0% kernel / faults: 1070 minor

    100% TOTAL(): 3.8% user + 7.8% kernel + 11% iowait + 0.1% irq + 0% softirq + 76% idle
    Load: 6.31 / 6.52 / 6.66

By comparing the data given in the sample, they are basically consistent.

Based on the above explanations, you can replicate the analysis in the ANR log to find performance bottlenecks. For example, if a large number of major page faults occur, it is not normal, and if iowait is high, you need to pay attention to whether there are intensive I/O operations.

Relevant Resources #

Feel free to click “Share with Friends” button to share today’s content with your friends and invite them to learn together.