21 Ui Optimization Part2 How to Optimize Ui Rendering

21 UI Optimization Part2 How to Optimize UI Rendering #

Confucius said, “Review the old to learn the new.” Before we dive into learning how to optimize UI rendering, let’s first review what we learned in “jank optimization.” In terms of jank optimization, we learned about four local tools for troubleshooting jank, as well as various methods for monitoring jank and frame rate online. Why do we need to review jank optimization? That’s because UI rendering can also cause jank, and some students may be confused about the difference between jank optimization and UI optimization.

When the VSYNC signal arrives in the Android system, if the UI thread is blocked by a time-consuming task and is unable to render the UI for a long time, jank will occur. However, this scenario is not our focus today. The core of UI optimization is to solve the jank caused by rendering performance itself, which can be considered as a subset of jank optimization.

From the perspective of designers and product managers, they hope that the application can achieve smooth user experience with rich graphic elements and more dazzling animations. However, the Android system may not be able to complete these complex interface rendering operations in a timely manner, which can lead to dropped frames. It is precisely because of this that we want to do UI optimization, because we have higher requirements and hope to achieve the 60 fps required for smooth visuals. It should be noted here that even though users may not notice obvious jank at 40 fps, we still need to further optimize.

So now let’s take a look at how we can achieve 60 fps in UI rendering. What are the methods that can help us optimize UI rendering performance?

UI Rendering Measurement #

After the previous learning, you should have mastered some UI testing tools and problem localization methods.

Testing tools: Profile GPU Rendering and Show GPU Overdraw. You can refer to Inspect GPU Rendering Speed and Overdraw for specific usage.
Problem localization tools: Systrace and Tracer for OpenGL ES. You can refer to Slow rendering for specific usage.

Starting from Android Studio 3.1, Android recommends using the Graphics API Debugger (GAPID) instead of Tracer for OpenGL ES. GAPID is an upgraded version that is not only cross-platform but also more powerful, supporting Vulkan and playback.

With the above tools, we can preliminarily determine whether the performance of app UI rendering meets the requirements, such as whether frame drops frequently occur, which stage of rendering the frame drops mainly occur in, and whether there is overdraw, etc.

Although these graphical interface tools are very useful, they are difficult to use in automated testing scenarios. So, what are some measurement methods that can be used for automated measurement of UI rendering performance?

1. gfxinfo

gfx info can output animation-related performance information at each stage as well as frame-related performance information. The specific command is as follows:

adb shell dumpsys gfxinfo package_name

In addition to performance related to rendering, gfxinfo can also obtain rendering-related memory and View hierarchy information. After Android 6.0, the gfxinfo command added the framestats parameter, which can obtain the time-consuming information of each drawing stage in the most recent 120 frames.

adb shell dumpsys gfxinfo package_name framestats

With this command, we can achieve automated statistics of the app’s frame rate. Furthermore, we can implement a customized “Profile GPU Rendering” tool that automatically analyzes which stage has the fastest growth in time consumption when frame drops occur and provides corresponding suggestions.

2. SurfaceFlinger

In addition to time-consuming, we are also concerned about the memory occupied by rendering. In the last session, I mentioned that after Android 4.1, each Surface has three Graphic Buffers. So, how can we view the memory occupied by Graphic Buffers and how the system manages this part of the memory?

You can obtain system-related information about SurfaceFlinger using the following command:

adb shell dumpsys SurfaceFlinger

Using today’s headlines as an example, the application uses three Graphic Buffers. The second Graphic Buffer currently used for display has a size of 1080 x 1920. Now we can also better understand the triple buffering mechanism. You can see that these three Graphic Buffers are indeed used interchangeably.

+ Layer 0x793c9d0c00 (com.ss.***。news/com.**.MainActivity)
   //Index            //State           //Object        //Size
  >[02:0x794080f600] state=ACQUIRED, 0x794081bba0 [1080x1920:1088,  1]
   [00:0x793e76ca00] state=FREE    , 0x793c8a2640 [1080x1920:1088,  1]
   [01:0x793e76c800] state=FREE    , 0x793c9ebf60 [1080x1920:1088,  1]

Continuing to scroll down, you can see the memory occupied by these three Buffers:

Allocated buffers:
0x793c8a2640: 8160.00 KiB | 1080 (1088) x 1920 | 1 | 0x20000900 
0x793c9ebf60: 8160.00 KiB | 1080 (1088) x 1920 | 1 | 0x20000900 
0x794081bba0: 8160.00 KiB | 1080 (1088) x 1920 | 1 | 0x20000900

This part of the memory is not small, especially now that the resolution of mobile phones is getting bigger and there are often other Surfaces in the app, such as using SurfaceView or TextureView, etc.

So, how does the system manage this part of the memory? When the app goes to the background, the system will reclaim this memory. Therefore, it will not be counted towards the app’s memory usage.

+ Layer 0x793c9d0c00 (com.ss.***。news/com.**.MainActivity)
   [00:0x0] state=FREE    
   [01:0x0] state=FREE    
   [02:0x0] state=FREE

So, how can we quickly judge whether the UI implementation meets the design drafts? How can we achieve more efficient UI automation testing? You can think about these questions first; we will discuss them in detail in the later session on “Efficient Testing”.

Common Techniques for UI Optimization #

Let’s review the stage flowchart of UI rendering again. Our goal is to achieve 60 fps, which means that all rendering operations must be completed within 16 ms (= 1000 ms / 60 fps).

UI optimization is about analyzing the time consumption of each stage of rendering, identifying bottlenecks, and then optimizing them. Let’s take a look at some commonly used techniques for UI optimization.

1. Utilize Hardware Acceleration

After our previous discussion, I believe you will agree that hardware-accelerated rendering performs much better than software rendering. Therefore, the first technique for UI optimization is to ensure that rendering utilizes hardware acceleration as much as possible.

What are the cases where we cannot use hardware acceleration? The reason for this is that hardware acceleration does not support all Canvas APIs. You can refer to the drawing-support documentation for a specific API compatibility list. If unsupported APIs are used, the system has to simulate rendering through CPU software, which is why rendering performance of effects such as gradients, blur, and rounded corners is relatively low.

SVG is also a typical example where many hardware-accelerated commands are not supported. However, we can use a trick to convert these SVG files into Bitmaps in advance, so that the system can use hardware acceleration more effectively. Similarly, for other scenarios involving rounded corners, gradients, etc., we can also switch to using Bitmaps.

The key to implementing this trick lies in how to pre-generate Bitmaps and how to manage the memory of these Bitmaps. You can refer to popular image libraries for implementations.

2. Create View Optimization

When observing the rendering pipeline, you may notice that a crucial step is missing, which is the time consumed by View creation. Don’t forget that View creation also occurs in the UI thread, and for some complex interfaces, the time consumed in this part should not be overlooked.

Before optimizing, let’s break down the time consumption for View creation, which may include random I/O time of various XML files, XML parsing time, and object instantiation time (the framework heavily relies on reflection).

Now let’s take a look at the optimization techniques for this stage.

Create Views Programmatically

Using XML to write UI is undoubtedly convenient and allows us to see the interface in real-time in Android Studio. However, if we want to achieve extreme optimization for an interface, we can create it programmatically.

But this approach is a disaster for development efficiency. Therefore, we can use some open-source tools that convert XML into Java code, such as X2C. However, it’s worth noting that not all cases can be directly converted.

Therefore, we need to balance performance and development efficiency. I suggest using this approach only in scenarios with high performance requirements but infrequent modifications.

Asynchronous Creation

Can we create Views in a separate thread to achieve preloading of UI? Those who have tried this will find that the system throws the following exception:

java.lang.RuntimeException: Can't create handler inside thread that has not called Looper.prepare()
  at android.os.Handler.<init>(Handler.java:121)

In fact, we can implement this by using another clever approach. When creating Views in a thread, we can replace the MessageQueue of the thread’s Looper with that of the UI thread’s Looper.

However, it is important to note that after creating Views, we need to restore the thread’s Looper to its original state.

View Reusing

Normally, Views are destroyed simultaneously with the destruction of an Activity. ListView and RecyclerView greatly improve rendering performance through View caching and reusing. Therefore, we can refer to their mechanisms and implement a View caching mechanism that can be used across different Activities or Fragments.

But here, we need to ensure that all Views in the cache have a clean slate and do not retain any previous states. WeChat has encountered problems with the View cache, leading to mixed up chat records between different users.

3. measure/layout Optimization

The measure and layout stages in the rendering process also need to be executed on the CPU in the main thread. There are many articles available online regarding optimization for this part. Here are some common techniques:

Reduce UI layout hierarchy: For example, flatten the hierarchy as much as possible and use optimizations like <ViewStub> and <Merge>.
Optimize layout overhead: Avoid using RelativeLayout or weighted LinearLayout as they have significant layout overhead. I recommend using ConstraintLayout instead of RelativeLayout or weighted LinearLayout.
Background optimization: Try not to set backgrounds repeatedly. It’s worth noting that the theme background is unnecessary if we have a custom background for our interface. However, since the theme background is set in the DecorView, this will lead to redundant drawing and performance loss.

Regarding measure and layout, can we achieve thread-based pre-layout like Create View? This could greatly improve the performance of the initial display.

TextView is a powerful and important control in the system, but its power comes with a lot of computation requirements. In the 2018 Google I/O conference, PrecomputedText was announced and integrated into Jetpack. It provides an interface for performing measure and layout asynchronously, without blocking the main thread.

Advanced Techniques for UI Optimization #

Can we use the same approach for other controls? Let’s take a look at the practices of new frameworks in the past two years, I will introduce Facebook’s open-source library Litho and Google’s open-source Flutter.

1. Litho: Asynchronous Layout

Litho is a declarative Android UI rendering framework developed by Facebook, based on another Facebook open-source layout engine Yoga.

Litho itself is very powerful and has done many excellent optimizations. Let me briefly introduce how it optimizes the UI.

Asynchronous Layout - In general, the drawing of all Android controls follows the pipeline of measure -> layout -> draw, and all of this happens on the main thread.

Like PrecomputedText mentioned earlier, Litho moves measure and layout to the background thread, leaving only draw, which must be completed on the main thread. This greatly reduces the workload on the UI thread. Its rendering pipeline is as follows:

Interface Flattening

As mentioned earlier, reducing the hierarchy of UI is a very common optimization method. You may wonder if there is a way to directly reduce the hierarchy of UI without modifying the code. Litho provides us with a solution. Since Litho uses its own layout engine (Yoga), it can detect unnecessary layers and reduce ViewGroups during the layout phase to achieve UI flattening. For example, in the image below, the upper part is the traditional way we write this interface, and the lower part is the interface written by Litho, which only has one layer.

Optimizing RecyclerView - Litho also optimizes the caching and recycling methods of UI components in RecyclerView. The native RecyclerView or ListView caches and recycles based on viewType. However, if there are too many view types in a RecyclerView/ListView, the caching becomes ineffective. But Litho recycles independently based on text, image, and video, which can increase cache hit rate, reduce memory usage, and improve scrolling frame rate.

Although Litho is powerful, it also has its own disadvantages. In order to achieve asynchronous measure/layout, it uses a react-like unidirectional data flow design, which increases the complexity of UI development to some extent. And Litho’s UI code is written in Java/Kotlin and cannot be previewed in Android Studio.

If you don’t plan to fully migrate to Litho, I suggest using Litho’s RecyclerCollectionComponent and Sections to optimize the performance of your RecyclerView.

2. Flutter: Custom Layout + Rendering Engine

As shown in the figure below, although Litho breaks through some of the system limitations to some extent by using its own layout engine, Yoga, it still follows the system’s rendering mechanism after draw.

So can we go deeper into the lower level and take over the system’s rendering as well? Flutter is such a framework. It is also a popular framework recently, and I will briefly introduce it here.

Flutter is a mobile application development framework launched and open-sourced by Google. Developers can use the Dart language to develop apps, and a set of code can run on both iOS and Android platforms.

Let’s take an overall look at the architecture of Flutter. On Android, Flutter does not rely on the system’s rendering engine at all. Instead, it directly integrates the Skia engine into the app, making the Flutter app more like a game app. It also uses the Dart virtual machine directly, which can be considered as a solution that goes beyond Android. Therefore, Flutter can easily achieve cross-platform development.

Developing Flutter applications simplifies the thread model. The framework abstracts different runners, including UI Runner, GPU Runner, I/O Runner, and Platform Runner. On the Android platform, when each engine instance starts, it creates a new thread for UI Runner, GPU Runner, and I/O Runner. All engine instances share the same Platform Runner and thread.

Since we are mainly discussing UI rendering in this article, let’s focus on analyzing Flutter’s rendering steps. For more specific information, you can read Flutter Internals.

First, the UI Runner executes the root isolate (which can be simplified as the main function). Let me explain the concept of isolate briefly. Isolate is an implementation of concurrent code in the Dart virtual machine. The Dart virtual machine implements the Actor concurrency model, which is similar to Erlang’s concurrency model. If you’re not familiar with Actor, you can think of isolate as a “thread” in the Dart virtual machine. The root isolate notifies the engine when there is a frame to be rendered.
After receiving the notification, the Flutter engine tells the system that we want to synchronize with the VSYNC.
Once the GPU receives the VSYNC signal, it performs layout on UI Widgets and generates a Layer Tree.
The Layer Tree is then passed to the GPU Runner for composition and rasterization.
The GPU Runner uses the Skia library to draw relevant graphics.

Flutter Rendering Steps

Flutter also adopts a similar approach to Litho and React, with immutable properties and one-way data flow. This has become a standard in modern UI rendering engines, as it allows separation of views from data.

Overall, Flutter captures the essence of various excellent front-end frameworks and is “blessed” with a powerful Dart virtual machine and Skia rendering engine. It can be considered an excellent framework. Many applications such as Xianyu and Toutiao have already used Flutter for certain features. Combined with Google’s latest Fuchsia operating system, could it be a disruptive development framework for Android? We will discuss Flutter in detail later in this column.

3. RenderThread and RenderScript

In Android 5.0, the system introduced RenderThread. For ViewPropertyAnimator and CircularReveal animations, we can use RenderThread for asynchronous rendering of animations. When the main thread is blocked, regular animations experience noticeable frame drops and stuttering, but animations rendered through RenderThread are unaffected even if the main thread is blocked.

Nowadays, more and more applications incorporate advanced image or video editing features, such as Gaussian blur, zooming, and sharpening of images. Let’s take the common use case of “scanQRCode” as an example. It involves a lot of image transformation operations like scaling, cropping, thresholding, and denoising.

Image transformations involve a significant amount of computational tasks, and as we learned in the previous article, using the GPU is a better choice in these scenarios. So how can we further squeeze the performance of the system’s GPU?

We can use RenderScript, which is an API provided by the Android operating system. RenderScript is based on the concept of heterogeneous computing and is specifically designed for computationally intensive tasks. RenderScript provides three fundamental tools: a hardware-independent general-purpose compute API, a compute API similar to CUDA, OpenCL, and GLSL, and a C99-like script language. It allows developers to implement complex and high-performance applications with less code.

How can we apply RenderThread and RenderScript to our projects? You can refer to the following practical solutions:

Summary #

Reviewing all the means of UI optimization, we can identify the following trends:

1. Optimizing within the system framework. Layout optimization, using code creation, view caching, and so on, all fall under this approach. The aim is to reduce or eliminate the time-consuming stages of the rendering pipeline.

2. Utilizing new system features. Hardware acceleration, the RenderThread, RenderScript, and others fall under this approach. By using new features provided by the system, maximum performance can be achieved.

3. Breaking through system limitations. Due to the fragmentation of the Android system, many good features may not be supported by older versions of the OS. Additionally, the system needs to support all scenarios, and in some specific scenarios, it may not be able to achieve the optimal solution. In these cases, we hope to break through the limitations imposed by the system. For example, Litho breaks through the limitations of layout, and Flutter goes even further by taking control of rendering as well.

Looking back at all the UI optimizations in the past, in the first stage of optimization, we could achieve very good results while still being bound by the limitations of the system. However, as we go further, it becomes easier to encounter bottlenecks. At this point, we need to delve deeper into the underlying layers and have greater control over the entire architecture. We need to create our own “wheel”.

Another aspect to consider in UI optimization is efficiency. Currently, Android Studio is not very friendly to designers. For example, it does not support Sketch and After Effects plugins. Lottie is a very good example that significantly improves the efficiency of animation development.

“Designers and product managers, grow up and learn to write UI yourselves.” In the future, we hope that UI interfaces and adaptation can be automated, or we can simply hand them back to designers and product managers.

Homework #

In your work experience, what UI optimization work have you done and do you have any “tricks” to share with other classmates? What are your thoughts on Litho and Flutter? Feel free to leave comments to discuss with me and other classmates.

There are two homework assignments for today. Try using Litho and Flutter frameworks.

Implement an information flow interface using Litho.
Write a Hello World app using Flutter and analyze the installation package size.

Please click “Invite friends to read” to share today’s content with your friends and invite them to learn together. Don’t forget to submit today’s homework in the comments section. I have also prepared a generous “study encouragement gift package” for students who complete the homework seriously. Looking forward to learning and progressing together with you.