Supplementary Article on Stutter Optimization Stutter Scene and Stutter Analysis

Supplementary Article on Stutter Optimization Stutter Scene and Stutter Analysis #

Using the instrumentation or Profilo solutions discussed in the previous chapter, we can obtain the execution time of all the functions involved in the lag process. In most cases, these solutions are indeed very useful as they allow us to pinpoint the true cause of the lagging.

However, you must have also encountered many inexplicable lags, such as reading a 1KB file, accessing a very small asset resource, or simply creating a directory.

Why does such a seemingly simple operation take up so much time? And how can we further identify and troubleshoot the problem by collecting more comprehensive lag scene information?

Lag Scenario #

Let me give you an example of lag that I once encountered online. Below are the specific time-consuming details.

From the graph, the onCreate function of Activity took 3 seconds, and the openNonAsset function in the Lottie animation took nearly 2 seconds. Although it reads a 30KB resource file, does it really take that long?

Today, let’s analyze this issue together.

1. Java Implementation

When further analyzing the source code related to openNonAsset, I found that there are a lot of synchronized locks inside AssetManager. First, I suspected that it was a lock problem, so next I needed to collect the status and stack traces of various threads during the lag for further analysis.

Step 1: Get Java Thread Status

By using the getState method of Thread, we can get the thread status, and at that time, the main thread was indeed in the BLOCKED state.

What is the BLOCKED state? When a thread fails to acquire the lock of the object in the following code, the thread will enter the BLOCKED state:

// Thread waits for the lock of the object
synchronized (object) {
    doSomething();
}  

WAITING, TIME_WAITING, and BLOCKED are states that need special attention. Many students may be confused about these two states: BLOCKED refers to a thread waiting to acquire a lock, which corresponds to situation one in the code below; WAITING refers to a thread waiting for the “wake-up action” of other threads, which corresponds to situation two in the code.

synchronized (object) {   // Situation one: Stuck here --> BLOCKED
    object.wait();        // Situation two: Stuck here --> WAITING
}  

However, when a thread enters the WAITING state, it not only releases CPU resources but also releases the lock it holds. For more information and explanations about the definitions and transitions of various Java thread states, you can refer to Thread.State and “Java Thread Dump Analysis”.

Step 2: Get the Stack Trace of All Threads

Next, at the Java layer, we further obtained the stack traces of all threads using Thread.getAllStackTraces(), hoping to know which thread specifically caused the main thread to be BLOCKED.

It should be noted that in Android 7.0, getAllStackTraces will not return the stack trace of the main thread. From the collected lag logs, we found a thread related to AssetManager:

"BackgroundHandler"  RUNNABLE
  at  android.content.res.AssetManager.list
  at  com.sample.business.init.listZipFiles

By examining AssetManager.list, it was indeed found that the same synchronized lock was used, and the list function needs to traverse the entire directory, which takes a long time.

public String[] list(String path) throws IOException {
  synchronized (this) {
    ensureValidLocked();
    return nativeList(mObject, path);
  }
}

On the other hand, the “BackgroundHandler” thread is a low-priority background thread, which is also a phenomenon mentioned in our previous article, that is, the main thread waits for a low-priority background thread.

2. SIGQUIT Signal Implementation

The Java implementation seems very good and it helped us discover the cause of the lag. However, we believe that the information in the ANR log is more comprehensive. Can we directly use the ANR log?

For example, in the following example, the information is indeed very detailed, including the status, CPU time slices, priority, stack trace, and lock information of all threads. utm represents utime, HZ represents the CPU clock frequency, and the formula to convert utime to milliseconds is “time * 1000/HZ”. In the example, utm=218, which means 218 * 1000/100 = 2180 milliseconds.

// Thread name; priority; thread ID; thread status
"main" prio=5 tid=1 Suspended
  // Thread group; thread suspend count; thread debug suspend count; 
  | group="main" sCount=1 dsCount=0 obj=0x74746000 self=0xf4827400
  // Thread native ID; process priority; scheduler priority;
  | sysTid=28661 nice=-4 cgrp=default sched=0/0 handle=0xf72cbbec
  // Native thread status; scheduler status; user time utime; system time stime; scheduled CPU
  | state=D schedstat=( 3137222937 94427228 5819 ) utm=218 stm=95 core=2 HZ=100
  // Stack-related information
  | stack=0xff717000-0xff719000 stackSize=8MB

Question 1: Native Thread Status

If you’re observant, you may notice that the status of the “main” thread in the ANR log is “Suspended.” So why isn’t there a Suspended state among the six Java thread states?

In fact, Suspended represents the native thread status. How to understand this? In Android, the execution of Java threads is delegated to a Linux standard thread pthread. The threads running in Android can be divided into two types: those that are attached to the virtual machine and those that are not. The threads managed by the virtual machine are all managed threads. Therefore, the status of Java threads is actually a mapping of native thread states.

Different versions of Android have different states for native threads. For example, Android 9.0 defines 27 thread states, which can more clearly distinguish the current situation of the thread. For information about Java thread states and native thread state transitions, you can refer to thread_state.h and Thread_nativeGetStatus.

We can see that the native thread states are indeed more abundant. For example, TIMED_WAITING is split into TimedWaiting and Sleeping, and WAITING is further divided into more than ten scenarios. This is very helpful for us to analyze specific scenarios.

Question 2: Obtaining ANR Logs

Although ANR logs provide rich information, the question arises of how to obtain ANR logs when the device is frozen.

We can use the system’s ANR generation mechanism with the following steps:

Step 1: When a freeze occurrs in the main thread, send a SIGQUIT signal to the system actively.

Step 2: Wait for the /data/anr/traces.txt file to be generated.

Step 3: Report the generated file.

With ANR logs, we can directly see that the main thread lock is held by the “BackgroundHandler” thread. In contrast, with the getAllStackTraces method, we can only speculate about one thread at a time.

  // Stack-related information
  at android.content.res.AssetManager.open(AssetManager.java:311)
  - waiting to lock <0x41ddc798> (android.content.res.AssetManager) held by tid=66 (BackgroundHandler)
  at android.content.res.AssetManager.open(AssetManager.java:289)

Thread deadlock and hotlock analysis is an interesting topic, and in many cases, it is also quite challenging. For example, we can only obtain locks used in the Java code, and some types of lock holding may not be reflected in the stack. If you are interested in this topic and want to delve deeper, you can read these two articles: Java Thread Dump Analysis and HandyQ Android Thread Deadlock Monitoring and Automated Analysis Practice.

3. Hook Implementation

Using the SIGQUIT signal to obtain ANR logs and thus get various information about all threads seems like a great solution. However, it has several issues:

  • Feasibility: As mentioned in the crash analysis, many higher version systems no longer have permission to read the /data/anr/traces.txt file.

  • Performance: Obtaining the stack for all threads and various information is very time-consuming, which may not be suitable for a freeze scenario. It may further exacerbate the user’s freeze.

So, is there a method to obtain ANR logs without affecting the user experience?

Let’s recall a method we discussed in the crash analysis to obtain the stacks of all threads. It is implemented through the following steps.

It basically simulates the process of printing ANR logs in the system, but because some advanced techniques are used in the whole process, it may cause crashes in the production environment.

For compatibility reasons, we will implement it by forking a child process, so even if the child process crashes, it will not affect the main process running. This also brings another great benefit, which is that the process of getting all thread stacks can be completely non-blocking for our main process.

However, using fork process will cause the process ID to change, and some information obtained through /proc/self in the source code will fail (incorrectly obtained information from the child process, while the child process only has one thread), such as state, schedstat, utm, stm, core, etc. But it’s not a big problem, because these information can be re-obtained by specifying /proc/[parent process ID].

"main" prio=7 tid=1 Native
  | group="" sCount=0 dsCount=0 obj=0x74e99000 self=0xb8811080
  | sysTid=23023 nice=-4 cgrp=default sched=0/0 handle=0xb6fccbec
  | state=? schedstat=( 0 0 0 ) utm=0 stm=0 core=0 HZ=100
  | stack=0xbe4dd000-0xbe4df000 stackSize=8MB
  | held mutexes=

In general, we have implemented a “non-destructive” method to obtain the stack of all Java threads and their detailed information through hooking. To reduce the amount of data reported, only when the Java thread of the main thread is in WAITING, TIME_WAITING or BLOCKED state, will we further use this “big killer”.

4. Scene information

Now, let’s take a look again. Is this self-constructed “ANR log” already a complete set of crash scene information? It seems to lack the common header information, such as process CPU usage and GC-related information.

As mentioned at the beginning of the 6th article, just like crashes, lags also require “scene information”. Is it possible to make the lag’s “scene information” richer than the system ANR log? We can further add the following information:

  • CPU usage and scheduling information. Referring to the exercise after the 5th article, we can obtain system CPU usage, load, CPU usage of each thread, and I/O scheduling information.

  • Memory-related information. We can add information about the total memory of the system, available memory, and memory of each application process. If Debug.startAllocCounting or atrace is enabled, we can also add GC-related information.

  • I/O and network-related. We can also collect detailed information about all I/O and network operations during the lag, which will be further explained later.

After Android 8.0, the Android virtual machine finally supports the JVM’s JVMTI mechanism. Many modules in the Profiler, such as memory collection, have also switched to this mechanism for implementation. Later, I will invite the “Learning Committee” Pengfei to talk to you about the JVMTI mechanism and its application. It provides a wealth of information, including memory allocation, thread creation, class loading, GC, etc., and has a large number of application scenarios.

Finally, we can also use some ideas from crash analysis, such as adding user operation paths and other information. This way, we can obtain a more comprehensive lag log than the system ANR, which will be more helpful for solving some difficult lag problems.

Analysis of Lagging Issues #

After capturing a lagging issue on the client side, the data needs to be uploaded to the backend for unified analysis. What kind of processing can we do with the data? What metrics should we focus on?

1. Lagging Rate

If we define a lagging problem as when the main thread is lagging for more than 3 seconds, similar to crashes, we first evaluate the impact of the lagging problem, which is the UV (Unique Visitor) lagging rate.

UV Lagging Rate = UVs with lagging / UVs with lagging data collection enabled

Because lagging issues are generally reported by sampling, the sampling rule should be similar to memory, where sampling is based on users. If a user is selected for data collection, they will have data continuously collected throughout the day.

The UV lagging rate can evaluate the scope of the lagging problem, but it is difficult to optimize lagging issues for low-end devices. If we want to evaluate the severity of the lagging issue, we can use the PV (Page View) lagging rate.

PV Lagging Rate = PVs with lagging / PVs with lagging data collection enabled

It is worth noting that for users selected for PV lagging rate collection, each time the app is launched, data needs to be reported as the denominator.

2. Lagging Tree

When a lagging issue occurs, we add CPU usage and load-related information to the lagging logs. Although a sampling strategy is used, the daily volume of logs still reaches hundreds of thousands. With such a large volume of logs, it is difficult to identify key points by simply using stack aggregation.

Is it possible for us to implement a flame chart for lagging, where we can see the overall information of the lagging in one chart?

Here, I highly recommend the approach of lagging trees. For lagging issues that last for more than 3 seconds, whether it’s 4 seconds or 10 seconds, this depends on the performance of the mobile phone and the environment at that time. We decided to abandon specific durations and only aggregate based on the proportion of occurrences of the same stack. In this way, we can see from one tree which stacks have the most lagging issues, and what branches exist below them.

Our resources are limited, so we usually prioritize solving top lagging problems. By using the aggregation method of lagging trees, we can see the situations of various branches of the top lagging problems from a holistic perspective, which helps us quickly find the key lagging points.

Summary #

Today, we started with a simple lagging problem and evolved step by step to come up with three solutions to solve this problem. Among them, the solution implemented in Java is the first choice for most students. Although it is simple and stable, it still has issues such as incomplete information and poor performance.

Many students may think that once the problem is solved, everything is fine. But I don’t think so. We should continue to ask ourselves, if a similar problem occurs again, can we use the same method to solve it? How much impact will this solution have on users, and is there still room for optimization?

Only in this way can we come up with Solution 2 and Solution 3 mentioned in the article, and solutions will continue to evolve and improve. Only in this way can we make rapid progress in the pursuit of excellence.

Homework #

Thread waiting, deadlock, and hotlock are very common in applications. Today’s homework is to share whether you have encountered these problems in your product and how you resolved them. Please share your learning and experience in the comments.

I found that many students are interested in monitoring the creation of threads in the comments. Today’s Sample is about how to monitor the creation of threads. Before practicing, here are some reference links for you:

I will explain the specific implementation principles and differences between PLT Hook and Inline Hook later. Here we can hide them and use open source implementations directly. Through this sample, I hope you can learn to analyze the source code and find reasonable hook functions and specific shared object libraries. I believe that once you are familiar with these methods, you will be pleasantly surprised that they are actually not difficult to implement.

Feel free to click “Share with friends” to share today’s content with your friends and invite them to study together. Don’t forget to submit today’s homework in the comments. I have prepared rich “Study and Encouragement Packages” for students who complete the homework seriously. Looking forward to progressing together with you.