12 Storage Optimization Part1 Common Data Storage Methods

12 Storage Optimization Part1 Common Data Storage Methods #

Through the foundational knowledge of I/O optimization that I have discussed in the previous columns, I believe you now have an understanding of file systems and disk mechanisms, as well as the advantages and disadvantages of different I/O methods. You should also be able to monitor I/O operations in production.

Rome wasn’t built in a day. While understanding and mastering these foundational knowledge, you must also want to know how to apply this knowledge to write better code.

Today, I will combine some features of the Android system to discuss the advantages and disadvantages of common storage methods in the development process. I hope this can help you make better choices in your daily work.

Basics of Storage in Android #

Before discussing specific storage methods, we should have some basic knowledge about storage-related concepts in the Android system.

1. Android Partitions

Most of the knowledge discussed in I/O optimization is focused on the Linux system. For Android, we should first understand the architecture and purpose of Android partitions. In the familiar Windows world, we usually install the system on the C drive, and then have several partitions for storing applications and data.

The various partitions in Android can be viewed through the /proc/partitions file or the df command. The following image shows the result of running the df command on a Nexus 6 device.

What is a partition? Simply put, a partition is a way to divide the storage of a device into non-overlapping sections, each of which can be formatted separately for different purposes. This allows the system to perform different operations on individual partitions. For example, during system recovery, we don’t want to affect user data stored in the /data partition even if there is a sudden power failure.

From the table above, you can see that each partition is independent and can use different file systems. The most important partitions are:

/system partition: This is where all the Google-provided Android components are stored. This partition can only be mounted in read-only mode. This is mainly for stability and security reasons, so that even in the event of a sudden power failure, the content of the /system partition remains intact and cannot be tampered with.
/data partition: This is where all user data is stored. It is mainly used for data isolation, which means that during system upgrades and recovery, the entire /system partition is erased but the user data in the /data partition remains unaffected. Performing a factory reset will only erase the data in the /data partition.
/vendor partition: This is where manufacturers store their specific system modifications. Especially after the introduction of the “Treble” project in Android 8.0, manufacturers can update only their /vendor partition during OTA updates, allowing them to update devices to the latest Android version more easily, quickly, and at a lower cost.

2. Storage Security in Android

In addition to data partitioning, storage security is also an important part of the Android system. Storage security primarily focuses on permission control.

First, permission control:

Each application in Android runs within its own application sandbox. Before Android 4.3, these sandboxes used standard Linux protection mechanisms, which defined them by creating unique Linux UIDs for each application. In simple terms, we need to ensure that WeChat cannot access Taobao’s data, and it cannot access certain protected system files without permission.

Starting from Android 4.3, the SELinux (Security Enhanced Linux) mechanism was introduced to further define the boundaries of Android application sandboxes. What’s special about SELinux? Its purpose is to prevent users with root permissions from doing whatever they want. To do anything in the SELinux system, you must first be granted permission in the dedicated security policy configuration file.

Second, data encryption:

In addition to permission control, users are also concerned about the privacy of their data in case of loss or theft. Encryption may be a good choice as it can protect data on lost or stolen devices.

Android has two device encryption methods: full-disk encryption and file-based encryption. Full-disk encryption was introduced in Android 4.4 and enabled by default in Android 5.0. It encrypts/decrypts user data operations in the /data partition and has some impact on performance, but new versions of chips provide direct hardware support.

File-based encryption was introduced in Android 7.0 to address the limitation of file system-based encryption. In this encryption mode, each file is assigned a key derived from the user’s passcode. Specific files cannot be accessed until the screen is unlocked, and remain inaccessible until the user next unlocks the screen.

One might wonder, what is the difference between these two device encryption methods in Android and encryption within an application? Do we still need to individually encrypt sensitive files stored by applications?

Devices encryption methods are transparent to application programs. They ensure that the data we read is decrypted. For application-specific sensitive data, we still need to use common encryption methods such as RSA, AES, and chacha20 for further storage encryption.

Common Data Storage Methods #

Android provides us with many different options for persistent storage. Before we go into detail about these options, let’s first ask ourselves, what is storage?

Everyone may have their own answer, but in my opinion, storage is the process of transforming specific data structures into formats that can be recorded and restored. These data formats can be binary, XML, JSON, Protocol Buffer, and other formats.

For flash memory, everything ultimately boils down to binary. XML, JSON, and other formats simply provide a set of common binary encoding and decoding specifications. Since there are so many storage options, what key factors should we consider when choosing a data storage method?

1. Key Factors

When choosing a data storage method, I generally consider the following points. Let me summarize them for you.

Key Factors

Which of these factors is the most important? Data storage methods cannot be considered without taking the scenario into account. It is impossible to make all six of these factors perfect.

Let me explain this further. If correctness is the top priority, we may need to adopt redundancy and dual-writing schemes, and tolerate the additional impact on time overhead. Similarly, if security is of great concern, the cost of encryption and decryption cannot be neglected. If we want to optimize for startup scenarios, we would prefer a solution that has advantages in terms of initialization time and reading time.

2. Storage Options

In general, we need to choose the appropriate data storage method based on the application scenario. What storage options does Android provide for application developers? You can refer to the Data Storage Options. Overall, there are several methods:

SharedPreferences
ContentProvider
Files
Databases

Today, I will first talk about the SharedPreferences and ContentProvider storage methods. Files and databases will be discussed in the next two articles on “Storage Optimization”.

First, let’s talk about the use of SharedPreferences.

SharedPreferences is a commonly used storage method in Android. It can be used to store collections of small key-value pairs.

Although SharedPreferences is very easy to use, it is also a storage method that has received many criticisms. It has several performance issues, and I can easily list its “seven deadly sins”.

Not safe across processes. Since it does not use cross-process locks, even with MODE_MULTI_PROCESS, SharedPreferences may lose all data when reading and writing across processes frequently. According to online statistics, the corruption rate of SharedPreferences is about one in ten thousand.
Slow loading. The loading of SharedPreferences files is done in an asynchronous thread, and the loading thread does not have its priority set. If the main thread reads data at this time, it needs to wait for the end of the file loading thread. This causes a problem of the main thread waiting for a low priority thread lock. For example, reading a 100KB SharedPreferences file may require a waiting time of about 50-100ms. I recommend preloading the SP files used in the startup process with an asynchronous thread.
Full writes. Whether you call commit() or apply(), even if we only modify one entry, the entire content will be written to the file. And even if we write to the same file multiple times, SharedPreferences does not merge multiple modifications into one, which is also an important reason for its poor performance.
Lagging. Due to the asynchronous disk saving apply mechanism, crashes or other abnormal situations may cause data loss. So when the application receives system broadcasts or is called at certain moments like onPause, the system will forcibly save all SharedPreferences object data to disk. If the saving is not completed, the main thread will be blocked. This easily leads to lagging and even ANR. According to online data, the lagging caused by SharedPreferences generally accounts for more than 5%.

Now, if you are not familiar with the SharedPreferences mechanism, you can refer to “Thoroughly Understanding SharedPreferences”.

To be honest, the intended use case for the SharedPreferences provided by the system is to store very simple and lightweight data. We should not use it to store overly complex data, such as HTML, JSON, and so on. Also, the performance of SharedPreferences file storage is related to file size. Each SP file should not be too large, and we should not save unrelated configuration items in the same file. Instead, we should consider isolating frequently modified entries.

We can also replace the default implementation of SharedPreferences by overriding the getSharedPreferences method of the Application class. For example, to optimize lagging, merge multiple apply operations, and support cross-process operations. How to replace it specifically? In today’s sample, I also provide a simple replacement implementation.

public class MyApplication extends Application {

@Override
public SharedPreferences getSharedPreferences(String name, int mode)
{
   return SharedPreferencesImpl.getSharedPreferences(name, mode);
}
}

Although there have been some improvements in the little fixes to the system’s SharedPreferences, it still cannot completely solve the problem. Basically, every large company will develop its own alternative storage solution. For example, WeChat recently open-sourced MMKV.

Here is a comparison of the “six elements” of MMKV and SharedPreferences.

You can refer to the implementation principle and performance test report of MMKV, which contain some very good ideas. For example, using file locks to ensure inter-process safety, using mmap to ensure data integrity, choosing Protocol Buffer over XML for better performance and storage space, supporting incremental updates, etc.

According to the analysis of I/O optimization, mmap is indeed very suitable for frequently modified configurations. Users do not need to understand the difference between apply() and commit(), nor do they need to worry about data loss. At the same time, we do not need to submit the entire file every time, which greatly improves overall performance.

Secondly, the use of ContentProvider.

Why didn’t the Android system design SharedPreferences to be cross-process safe? That’s because the Android system prefers us to choose ContentProvider as the storage method in this scenario. ContentProvider, as one of the four major components of Android, provides a mechanism for sharing data between different processes or even different applications.

In the Android system, modules such as albums, calendars, audio, videos, and contacts all provide access support for ContentProvider. Its usage is very simple, you can refer to the official documentation.

Of course, there are a few points to note during usage.

Startup performance

The lifecycle of ContentProvider is by default before Application’s onCreate(), and all are created on the main thread. There should not be any time-consuming operations in the constructor, static code blocks, and onCreate() functions of our custom ContentProvider class, as it will slow down the startup speed.

Many students may not know that ContentProvider also has a multi-process mode, which can be used in conjunction with the multiprocess attribute in AndroidManifest. In this way, the calling process will directly create a Provider instance of the push process in its own process, so there is no need for cross-process invocation. It should be noted that this will also bring about the problem of multiple instances of Provider.

Stability

When ContentProvider transfers data across processes, it uses Android’s Binder and anonymous shared memory mechanism. In simple terms, it passes the file descriptor of the anonymous shared memory inside CursorWindow object through Binder. In this way, during the process of cross-process transmission, the result data does not need to be transmitted across processes, but the same block of anonymous memory is manipulated in different processes by the transmitted file descriptor of the anonymous shared memory. This achieves the purpose of different processes accessing the same data.

As I mentioned earlier in the I/O optimization, the mmap-based anonymous shared memory mechanism also comes at a cost. When the amount of data being transmitted is very small, it may not be cost-effective. Therefore, ContentProvider provides a call() function, which directly transfers data through Binder.

Android’s Binder transmission has a size limit, generally 1-2MB. The interface call of ContentProvider and the call() function do not use the anonymous shared memory mechanism. For example, if you need to insert a large amount of data, there will be an array of inserted data. If this array is too large, there may be a data size exception.

Security

Although ContentProvider provides a good security mechanism for data sharing between applications, if ContentProvider is exported, when supporting SQL statement execution, you need to be careful of SQL injection issues. In addition, if the parameter we pass in is a file path and returns the content of the file, we also need to validate its legitimacy. Otherwise, the private data of the entire application may be obtained. This is a common mistake when passing parameters in intents.

Finally, let me summarize the pros and cons of the “six elements” of ContentProvider.

Overall, the ContentProvider solution is relatively cumbersome and suitable for transmitting large amounts of data.

Summary #

Although SharedPreferences and ContentProvider are commonly used storage methods in our daily work, they do have their own pitfalls. Therefore, it is crucial for us to fully understand their advantages and disadvantages, so that we can use and optimize them better in our work.

Choosing the appropriate storage method for a specific scenario is a prerequisite for storage optimization. You should learn to evaluate a storage method based on six key factors: correctness, time complexity, space complexity, security, development cost, and compatibility.

The same principle applies when designing a storage solution. We cannot achieve the best performance in all aspects, so we need to make trade-offs and choices. There is no globally optimal solution in the world of storage. What we are looking for is a locally optimal solution. At this point, it is important to clarify our own requirements, be willing to sacrifice some key indicators, and focus on achieving the best performance for the factors that matter the most in our specific scenario.

Homework #

Below is the performance testing report provided by MMKV. You can see that compared to the system’s SharedPreferences, the main difference lies in the write speed.

No practice, no right to speak. Today, let’s try to test and compare the performance difference between MMKV and the system’s SharedPreferences. Please share your test results and analysis in the comments section to exchange ideas with your classmates.

Today’s exercise Sample replaces the default implementation of the system’s getSharedPreferences method by overriding it in the Application class. Although this is not the best approach, its main advantage is that it has low code intrusion and requires minimal code modification.

Feel free to click “Please read for friends” and share today’s content with your friends to invite them to learn together. Finally, don’t forget to submit today’s homework in the comments section. I have also prepared a generous “study encouragement package” for students who complete the homework diligently. Looking forward to progressing together with you.