04 Memory Optimization Part2 Where to Start With Memory Optimization

04 Memory Optimization Part2 Where to Start with Memory Optimization #

After grasping the background knowledge related to memory, the next step is to start optimizing memory. However, before we actually begin the memory optimization process, we need to evaluate the impact of memory on application performance. This evaluation can be done by analyzing the ratio of “crash exits” and Out Of Memory (OOM) occurrences. On the other hand, low-memory devices are more prone to exceptions and lag caused by insufficient memory, and we can evaluate this by examining the proportion of devices with less than 2GB of memory among the users of the application.

Therefore, it is crucial to set our goals before optimization. For example, optimizing for a 512MB device is a completely different approach compared to optimizing for devices with 2GB or more of memory. If we are targeting users in Southeast Asia or Africa, the standards for memory optimization become even stricter.

After all the preparations, let’s now take a look at the methods for memory optimization.

Memory Optimization Discussion #

Where should we start with memory optimization? I usually start with device grading, Bitmap optimization, and memory leaks.

1. Device Grading

You may have encountered the situation where an app runs smoothly on a smartphone with 4GB of memory, but it may not work as well on a smartphone with 1GB of memory. Furthermore, the performance may vary when the system is idle or busy.

Memory optimization should first consider the device environment. In a previous article, I mentioned a common misconception: “the less memory used, the better.” In reality, we can allocate and recycle memory differently based on the performance of different devices.

Of course, this requires a well-designed architecture, and the following points should be considered during the design:

  • Device grading: Use strategies like device-year-class to grade devices. For low-end devices, complex animations or certain features can be turned off, and 565 format images and smaller cache memory can be used. In the real world, not every user’s device is as high-end as our test devices. During the development process, we need to consider whether certain features should be enabled for low-end devices and whether they can be downgraded when system resources are scarce.

Here’s an example:We know that device-year-class determines the year a device belongs to based on information such as memory, CPU cores, and frequency. This example represents adding complex animations for devices after 2013 and no animations for low-end devices before 2010.

if (year >= 2013) {
    // Do advanced animation
} else if (year >= 2010) {
    // Do simple animation
} else {
    // Phone too slow, don't do any animations
}
  • Cache management: We need a unified cache management mechanism that can appropriately use memory and return it when needed. We can use the OnTrimMemory callback to release memory based on different states. For large projects, there may be dozens or even hundreds of modules, so unified cache management can better monitor the cache size of each module.

  • Process model: An empty process occupies 10MB of memory, and some applications may have dozens of processes at startup, or even upgrade from dual-process to quadruple-process for keeping the application alive. Therefore, reducing the number of processes at startup, reducing resident processes, and keeping prudent with process management are crucial for memory optimization on low-end devices.

  • Installation package size: The size of code, resources, images, and native libraries in the installation package has a significant impact on the memory they occupy. An 80MB app is unlikely to run smoothly on a smartphone with 512MB of memory. In this case, we need to consider launching a lightweight version specifically for low-end device users, like Facebook Lite and the “Extreme” version of today’s headlines.

What is the relationship between the size of code, images, resources, and native libraries in the installation package and memory? You can refer to the table below.

2. Bitmap Optimization

Bitmap memory usually accounts for a large portion of the total memory used by an application, so memory optimization can never avoid the “eternal theme” of image memory.

Even if we place all the Bitmaps in Native memory, it does not mean that the image memory issue is completely solved. This approach only improves the utilization of system memory and mitigates some problems caused by garbage collection.

Now let’s take a look at how to optimize image memory. I will introduce two methods.

Method 1: Unified Image Library. The premise of optimizing image memory is to consolidate the use of images so that we can implement overall control strategies. For example, low-end devices can use the 565 format and more rigorous scaling algorithms, which can be achieved using libraries such as Glide, Fresco, or through custom implementation. Additionally, we need to further consolidate all interfaces related to Bitmap.createBitmap and BitmapFactory.

Method Two: Unified Monitoring

After unifying the image library, monitoring the usage of Bitmap becomes much easier. Here are three main points to consider:

  • Large image monitoring: We need to pay attention to whether a certain image occupies too much memory, for example, if its dimensions are much larger than the view or even the screen. During development, if improper image usage is detected, a dialog box should immediately pop up to alert the developers of the activity and the stack trace where the image is used, enabling them to quickly identify and resolve the problem. In the gray and online environments, the exception information can be reported to the server. We can then calculate the proportion of images that exceed the screen size, which we call the “over-sized rate”.

  • Duplicate image monitoring: Duplicate images refer to Bitmaps with identical pixel data, but with multiple different objects existing. This monitoring does not require a large number of samples and is generally used internally. I have previously implemented a memory Hprof analysis tool that automatically outputs duplicate Bitmap images and their reference chains. The following image shows a simple example where you can see two images with identical content. By resolving this duplicate image, 1MB of memory can be saved.

  • Total image memory: By consolidating image usage, we can also calculate the total memory occupied by all images in the application. This allows us to analyze the memory usage of images based on different systems, screen resolutions, and other dimensions in the production environment. During an Out of Memory (OOM) crash, we can also write the total memory occupied by images and the memory of the top N images to the crash log, which helps us troubleshoot problems.

After discussing device grading and Bitmap optimization, we found that both architecture and monitoring need to be considered. A good architecture can reduce or even avoid mistakes, while good monitoring helps us detect problems in a timely manner.

3. Memory Leaks

Simply put, a memory leak occurs when unreferenced memory is not reclaimed. Identifying and resolving memory leaks is also an important task in memory optimization.

Memory leaks can be divided into two main cases: one is when the same object leaks continuously, and the other is when new objects leak each time, resulting in hundreds or even thousands of useless objects.

Many memory leaks are caused by unreasonable framework design, with various singletons flying around, and the lifecycle of controllers in an MVC architecture being much longer than that of views. A good framework design can reduce or even avoid such mistakes. However, this is not an easy task, so continuous monitoring of memory leaks is also necessary.

  • Java memory leaks: Establish an automated detection solution similar to LeakCanary, at least for detecting leaks in activities and fragments. During development, we hope that when a leak occurs, a dialog box will pop up to make it easier for developers to discover and resolve the issue. Online monitoring of memory leaks is not easy. We can optimize the generated Hprof memory snapshot file by trimming most of the byte arrays corresponding to images, reducing the file size. For example, a 100MB file is generally reduced to about 30MB. After compressing it with 7zip, the final size is less than 10MB, increasing the success rate of file uploads.

  • OOM monitoring: Meituan has an Android memory leak automated analysis component called Probe. It generates an Hprof memory snapshot when an OOM occurs and then performs further analysis on the file in a separate process. However, using this tool online still carries significant risks. Generating memory snapshots when crashes occur may cause secondary crashes, and in some cases, generating an Hprof snapshot on certain phones may take several minutes, greatly affecting the user experience. In addition, some OOMs are caused by insufficient virtual memory, which needs to be analyzed on a case-by-case basis.

  • Native memory leak monitoring: In a previous article on WeMobileDev titled “WeChat Android Terminal Memory Optimization Practices,” other attempts were made apart from Malloc Debug and Malloc Hook, which still seem unstable. For situations where the shared object (so) cannot be recompiled, a solution called PLT Hook was used to intercept memory allocation functions in libraries. PLT Hook is one type of native hook, which will be discussed later. The intercepted functions are then redirected to our own implementation to record information such as the allocated memory address, size, source so library path, etc. Periodically, the allocation and deallocation pairs are scanned, and if any inconsistencies are found, the recorded information is output.

  • For situations where the so can be recompiled, all functions are instrumented through GCC’s “- finstrument-functions” parameter, with the instrumentation simulating the push and pop operations of the call stack. The memory allocation and deallocation functions are intercepted through the ld’s “–wrap” parameter, and they are redirected to our own implementation to record information such as the allocated memory address, size, source so, and the content of the instrumented call stack at that time. Periodically, the allocation and deallocation pairs are scanned, and if any inconsistencies are found, the recorded information is output.

For memory leak troubleshooting during development, Android Profiler and MAT tools can be used in conjunction. To ensure timely detection of problems, it is crucial to establish a comprehensive monitoring system.

To be honest, except for Java leak detection, the current OOM monitoring and Native memory leak monitoring can only achieve experimental automation testing levels. WeChat’s native monitoring solution has also encountered some compatibility issues. If you want to deploy it in gray and production environments, there will be many details to consider. Native memory leak detection is relatively simpler on iOS, but Google has been constantly improving the performance and usability of Native memory leak detection. We believe that there will be significant improvements in future versions of Android.

Memory Monitoring #

As mentioned earlier, there are some performance issues with monitoring memory leaks, so it is typically only enabled for internal personnel and a small group of users. In production, we need to monitor memory-related issues through other more effective means.

1. Collection Method

When users are on the front end, we can collect PSS (Proportional Set Size), Java heap, and total image memory every 5 minutes. I suggest sampling only a portion of the users, and it’s important to sample based on users, not per occurrence. In other words, if a user is selected for data collection, data should be collected continuously throughout the day.

2. Calculation Metrics

Using the data mentioned above, we can calculate several memory metrics.

Memory Exception Rate: This metric can reflect abnormal memory usage. If there are inappropriate memory usage or memory leak scenarios, this metric will increase. The value of PSS can be obtained through Debug.MemoryInfo.

Memory UV Exception Rate = UV with PSS exceeding 400MB / Sampled UV

Trigger Rate: This metric can reflect the usage of Java memory. If it exceeds 85% of the maximum heap limit, garbage collection (GC) will occur more frequently, which may result in out-of-memory (OOM) errors and lagging.

Memory UV Trigger Rate = UV with Java heap usage exceeding 85% of the maximum limit / Sampled UV

Whether it triggers or not can be calculated using the method described below.

long javaMax = runtime.maxMemory();
long javaTotal = runtime.totalMemory();
long javaUsed = javaTotal - runtime.freeMemory();
// Java memory usage exceeding 85% of the maximum limit
float proportion = (float) javaUsed / javaMax;

In general, only data is reported from the client side, and all calculations are processed on the server side, which allows for flexibility. On the server side, we can also calculate average PSS, average Java memory, and average image memory, which can reflect the average memory usage. By comparing these metrics between different versions, we can monitor whether there are any new memory-related issues.

Since the foreground time is reported, we can also observe the memory usage trend over time. For example, we can check whether our application truly follows the principle of “allocate when needed and release timely.” If needed, we can also compare memory usage based on different scenarios.

3. GC Monitoring

In the laboratory or internal trial environment, we can monitor Java memory allocation and garbage collection using Debug.startAllocCounting. It is important to note that this option has some impact on performance and has been marked as deprecated by Android.

Through monitoring, we can obtain information such as the number and size of memory allocations, as well as the number of GC invocations.

long allocCount = Debug.getGlobalAllocCount();
long allocSize = Debug.getGlobalAllocSize();
long gcCount = Debug.getGlobalGcInvocationCount();

The above information may not be easy to pinpoint issues, but starting from Android 6.0, the system can obtain more accurate GC information.

// Number of GC runs
Debug.getRuntimeStat("art.gc.gc-count");
// Total time spent on GC in milliseconds
Debug.getRuntimeStat("art.gc.gc-time");
// Number of blocking GC runs
Debug.getRuntimeStat("art.gc.blocking-gc-count");
// Total time spent on blocking GC
Debug.getRuntimeStat("art.gc.blocking-gc-time");

Pay special attention to the number and time spent on blocking GC, as it may pause the application thread and cause lagging. We can also collect statistics with more granularity based on different application scenarios, such as app startup, entering the Moments page, entering the chat page, etc.

Summary #

Before we start optimizing the content, we need to ask ourselves a few questions, such as what our optimization goals are and how much memory causes exceptions and stuttering for us. Only after clarifying the current state of the application and the optimization goals can we proceed to the next steps.

When discussing memory optimization ideas, we hope to provide users with different experiences based on different devices and device conditions. Here, I mainly talked about methods for Bitmap memory optimization, memory leak debugging, and monitoring. Finally, I mentioned how to monitor memory exceptions online, and metrics such as memory exception rate and peak rate are very helpful for us.

Currently, our analysis of Native leaks is not yet perfect. However, when doing optimization work, I particularly like to approach the problem with an evolutionary mindset. Even Google, when the timing is not right, would make compromises and trade-offs. For us personally, when the timing is right or when our capabilities are sufficient, we need to promptly address these “technical debts”.

Homework #

After reading the memory optimization methods I shared, I believe you must have many good ideas and methods. Today’s homework is to share your “killer skill” for memory optimization. Please share your learning, practice, gains, and insights in the comment section.

In the article, I mentioned Hprof file trimming and duplicate image monitoring, which many applications do not currently implement. These two functions are also part of the memory monitoring in the WeChat APM framework Matrix. Matrix is the last project I worked on at WeChat over a year ago, and I put a lot of effort into it. Recently, I heard that it is finally going to be open sourced.

So today, let’s practice together and try to use the HAHA library to quickly determine if there are duplicate images in the memory. Then, output the PNG, stack trace, and other information of these duplicate images. The final implementation can be submitted as a Pull Request to the Sample repository.

Feel free to click on “Invite a Friend to Read” to share today’s content with your friends and invite them to learn together. Finally, don’t forget to submit your homework in the comment section. I have also prepared a generous “Study Encouragement Gift” for students who complete the homework seriously. I look forward to learning and improving with you.