39 New Horizon in Mobile Development the Trend of Intelligent Computing

39 New Horizon in Mobile Development The Trend of Intelligent Computing #

Hello, I’m Zhang Shaowen. Today’s author, Huang Zhen, is an expert in the field of algorithms. I have worked with him on AI projects for mobile devices in the past, and I am deeply impressed by both his algorithmic expertise and engineering capabilities. Today, I am extremely fortunate to have him here to share his insights on edge AI computing in the context of mobile development.

If we consider mobile development as the hottest trend in recent years, it is clear that the focus has now shifted to AI. This is most evident in the increasing number of college graduates in China who are flocking to AI-related fields. As mobile developers, how can we secure our own place in the AI wave?

In my personal opinion, even though practical applications of AI on mobile devices are currently limited, we must still embrace this “pit” of AI. We can try using frameworks like TensorFlow to create simple demos and gradually deepen our understanding of deep learning.

Various companies such as Google with TensorFlow Lite, Facebook with Cafe2, Tencent with NCNN and FeatherCNN, Xiaomi with MACE, and Baidu with Paddle-Mobile have all open-sourced their own mobile deep learning frameworks. The future of mobile devices will undoubtedly be a very important battlefield for deep learning, and mobile developers need to leverage their platform advantages. We should have a say in the optimization of these mobile deep learning frameworks, striving to possess both algorithmic thinking and strong engineering capabilities. Just like in the previous project I collaborated on with Huang Zhen, we took the lead in optimizing the performance of the entire framework, utilizing a large number of ARM NEON instructions and assembly code.

Hello everyone, I’m Huang Zhen, and I currently work as an algorithm engineer at a large internet company. I have always been interested in edge AI computing, and I have also worked on machine learning projects for mobile devices in the past, during which I had the opportunity to interact with many Android developers. While collaborating with them, I also pondered over the points where AI and Android development can be combined, and I witnessed the machine learning development capabilities demonstrated by many Android developers in the team. Therefore, I gladly accepted Shaowen’s invitation to share my understanding of machine learning on mobile devices in this Android development column, hoping that it can be helpful to you.

Currently, there are two trends in technological advancements. One trend is that with the development of 5G networks, the Internet of Things (IoT) makes “everything connected” possible. During the era of IoT, cloud computing and edge computing coexist, with edge computing playing an important role. Edge computing is a computing paradigm that is the opposite of cloud computing. It refers to processing and analyzing data at the network edge nodes (end devices). Edge computing is important for several reasons: firstly, in the era of IoT, there will be billions of end devices continuously collecting data, and the resulting computational workload will be too much for cloud computing to handle; secondly, the computing capabilities of end devices are constantly improving, and not all computations need to be completed in the cloud; thirdly, the mode of sending data back to the cloud for computation and then returning the results to the end device inevitably introduces latency, which not only affects the user experience but is also unacceptable in some scenarios; finally, as users pay more attention to data security and privacy, there will be a demand for data processing and computation on end devices.

The other trend is the rapid development of artificial intelligence technologies. In recent years, artificial intelligence technologies, especially deep learning, have made breakthrough progress. They have not only generated widespread interest and attention in society but are also reshaping the commercial landscape. The significant value of artificial intelligence lies in using intelligent algorithms and technologies to mechanize work that previously required human labor, liberating human labor by utilizing the scale of machines, and breaking through the bottleneck of human labor productivity, thus greatly improving production efficiency. In the business society, the “scale” technology will release tremendous energy, create huge value, and even reshape the competition in the business world, expanding the external boundaries of the entire business.

Since edge computing possesses computational capabilities, intelligent computing needs arise in many application scenarios. When these two are combined, it forms the concept of edge AI computing.

Development Techniques for Mobile Machine Learning #

Currently, the application of mobile machine learning technology mainly focuses on image processing, natural language processing, and speech processing. In specific applications, this includes but is not limited to object detection in video images, language translation, voice assistants, and beauty filters. There is a huge demand for these applications in fields such as autonomous driving, education, healthcare, smart homes, and the Internet of Things.

1. Computing Frameworks

Because deep learning algorithms have many specific algorithm operators, major companies have developed computing frameworks for mobile deep learning to improve development and model deployment efficiency. Examples of these frameworks include Google’s TensorFlow Lite and Facebook’s Caffe2. It would be beneficial for Android developers to familiarize themselves with these frameworks. To cultivate interest and gain an intuitive understanding, you can choose a mature framework from a major company, such as TensorFlow Lite, and develop a demo of object detection on a mobile phone.

In practical projects, TensorFlow Lite and Caffe2 usually run relatively slower. Comparison charts for various computing frameworks can be easily found online. If you want to truly get started, I recommend using the NCNN framework. NCNN is a computing framework open-sourced by Tencent, which has clear code structure, small file size, and high runtime efficiency. I strongly recommend Android developers who are interested in this field to read the source code a few times. This will not only help understand commonly used deep learning algorithms but also help comprehend mobile machine learning computing frameworks.

While reading the NCNN source code, it is recommended to focus on three basic data structures: Mat, Layer, and Net. Mat is used to store the values of matrices. In a neural network, every input, output, and weight is stored using Mat. Layer represents an operation, so every Layer must have a forward operation function (forward function). All operators, such as convolution and LSTM operations, are derived from Layer. Net is used to represent the entire network by combining all data nodes and operations.

While reading the NCNN source code, it is also a good idea to explore introductory materials on convolutional neural network algorithms, as it will aid in understanding.

2. Computing Performance Optimization

In practical projects, if a particular path becomes a bottleneck in terms of time overhead, it is usually possible to implement that node using the NDK (Native Development Kit). However, in most cases, mobile machine learning computing frameworks are already implemented using the NDK. In this case, the direction for improvement is using ARM NEON instructions and assembly optimization.

NEON is a 128-bit SIMD (Single Instruction, Multiple Data) extension structure for ARM Cortex-A series processors. The key to achieving performance optimization with ARM NEON instructions lies in their ability to perform single instructions on multiple data simultaneously. The following diagram illustrates this concept:

Each operand has 128 bits and contains 4 32-bit registers of the same type. With a single instruction like the one shown below, parallel computation of 4 32-bit data can be achieved, leading to performance optimization.

VADDQ.S32 Q0,Q1,Q2

We once implemented the PixelShuffle operation in deep learning algorithms using the NDK, but later achieved a 40-fold efficiency improvement by optimizing it using ARM NEON instructions and assembly.

To further improve computing performance, Int8 quantization can be employed. In deep learning, most calculations are based on Float32 types, with each Float32 being 32 bits and able to store 4 Float32 data in a 128-bit register. In contrast, Int8 data can store 16 data points. Combining the aforementioned single instruction, multiple data concept, using Int8 data allows a single instruction to simultaneously execute operations on 16 Int8 data points, greatly improving parallelism.

However, quantizing Float32 values to Int8 representation inevitably affects data precision, and the quantization process itself introduces time overhead. These are important considerations. Interested students can read this paper for more information on quantization.

If the device has a GPU, OpenCL can also be used for GPU acceleration, such as in Xiaomi’s open-source mobile machine learning framework, MACE.

Algorithm Techniques for Mobile Machine Learning #

For students who are just beginning to study algorithms, I always advocate not to imagine algorithms as too complex or too mathematical, as it can be intimidating. Mathematics is the expression of logical thinking, and we hope to use mathematics to help us understand algorithms. We need to be able to intuitively understand algorithms and have an intuitive understanding of why a particular algorithm has a certain effect. Only then can we truly master the algorithm. On the other hand, if you only remember the mathematical derivation of an algorithm without forming an intuitive understanding, you won’t be able to apply it flexibly.

1. Algorithm Design

Deep learning for image processing has a wide range of applications and is relatively intuitive and interesting. I recommend starting with deep learning for image processing. The most basic knowledge for deep learning image processing is convolutional neural networks, so you can start by learning about convolutional neural networks. There are many introductions to convolutional neural networks available online, so I won’t go into detail here.

The key to understanding convolutional neural networks is to understand their learning mechanism, why they can learn. To understand this, we first need to clarify two key points: “forward propagation” and “backward propagation”. Once the structure and activation functions of the neural network are determined, the process of “training” or “learning” is actually continuously adjusting the weights of each neuron in the network to make the overall network’s computation converge towards the desired value (target value). Forward propagation starts with the input and aims to obtain the network’s output, then compare the output with the target value to get the result error. Afterwards, the result error is back-propagated through the network structure, decomposed into errors for each node, and then the weights of each node are adjusted based on their respective errors. The purpose of “forward propagation” is to obtain the output result, and the purpose of “backward propagation” is to adjust the weights through back-propagating errors. By iteratively alternating between the two, we hope to achieve consistency between the output result and the target value.

Once you understand convolutional neural networks, you can start implementing an example for handwritten character recognition. After mastering that, you can proceed to learn object detection algorithms in deep learning, such as YOLO, Faster R-CNN, etc. Finally, you can try implementing them using a framework like TensorFlow, run training data, tune parameters, while understanding by doing. During the learning process, pay special attention to the design concepts and problem-solving approaches of the algorithm models.

2. Performance Optimization

Performance optimization refers to improving the accuracy and other metrics of the algorithm model. There are several common ways to achieve this:

  • Optimize training data
  • Optimize algorithm design
  • Optimize model training methods

Optimize Training Data

Since the algorithm model learns from training data, it cannot learn patterns outside the training data. Therefore, when selecting training data, special care must be taken to ensure that the training data contains patterns that occur in the actual scenarios. Carefully selecting or labeling training data can effectively improve the performance of the model. Data annotation is so important for performance that there are startups dedicated to data annotation that have raised a significant amount of funding.

Optimize Algorithm Design

Choose better algorithm models based on the problem at hand, such as using deep learning models instead of traditional machine learning models, using models with higher feature expression capabilities, etc. For example, use residual networks or DenseNet to enhance the feature expression capability of the network.

Optimize Model Training Methods

Optimizing model training methods includes selecting the appropriate loss function, whether to use regularization, whether to use Dropout structures, and which gradient descent algorithm to use.

3. Computational Efficiency Optimization

Although we have done a lot of work on the framework side to improve computational performance, if we can reduce computational complexity on the algorithm side, the overall computational real-time performance will also improve. From the model perspective, there are two ways to reduce computational complexity: designing lightweight network models and compressing models.

Designing Lightweight Network Models

Both academia and industry have designed lightweight convolutional neural networks that significantly reduce computational complexity while maintaining model accuracy, thereby reducing computational costs. A typical example of this approach is Google’s MobileNet, which, as the name suggests, is designed for use on mobile devices. MobileNet splits standard convolutional neural network operations into Depthwise convolution and Pointwise convolution. It first uses Depthwise convolution to perform convolution operations on each channel of the input separately, and then uses Pointwise convolution to fuse information between channels.

Model Compression

Model compression includes structured sparsity and distillation.

In logistic regression, we introduce regularization to approximate certain feature coefficients to be zero. In convolutional neural networks, we also hope to achieve structural sparsity by introducing regularization, such as approximating all coefficients of certain channels’ convolution kernels to be zero, thereby pruning those convolution kernels and reducing unnecessary computational cost.

Distillation method refers to transfer learning, which means designing a simpler network and training it in a way that it approximates the representation capabilities of the target network, achieving the “distillation” effect.

Opportunities for Android Developers in Machine Learning #

In mobile machine learning, computational frameworks and algorithms play important roles. The former focuses on the performance of model computation to reduce time costs, while the latter is responsible for model precision. Additionally, algorithms can be designed to reduce computational complexity and achieve a reduction in time costs.

It is important to note that algorithm model training is typically done on the server-side in mobile machine learning. Currently, the terminal devices usually do not handle model training. Instead, the trained model results are loaded onto the terminal device, which performs forward computation to obtain the model’s computing results.

But after discussing industry trends and the basics of machine learning, how can mobile developers enter this “hot” field? Mobile machine learning falls into the category of edge intelligent computing, and mobile development, particularly Android development, is a field that Android developers are familiar with. Therefore, this presents an opportunity for Android developers to transition into the field of edge intelligent computing. Android developers can leverage their technical expertise, establish a strong foothold in terminal device programming for edge computing, and have a solid foundation in future technology development systems. Simultaneously, they can gradually learn deep learning algorithms, eventually taking a step forward into the field of edge intelligent computing and creating more significant technical value.

In most cases, Android developers may have a disadvantage compared to professionals specializing in deep learning algorithms. Therefore, it is crucial not to give up on their expertise in terminal device development. For most Android developers, the optimal positioning in future technology division is to combine “specializing in Android development + understanding deep learning algorithms” to create maximum value.

As for the learning path, I suggest that Android developers start by studying the fundamentals of convolutional neural networks (structure, training, and forward computation). Then, they can read and learn the NCNN open-source framework to master optimization methods for computational performance and improve their development skills. Simultaneously, they can gradually study algorithm techniques, focusing on various common deep learning algorithm models, and pay particular attention to lightweight neural network algorithms that have emerged in recent years. In conclusion, Android developers need to focus on improving both development and algorithm techniques to enhance computational real-time performance, while also learning deep learning algorithm models.

Based on the above description, I have outlined a comprehensive diagram of mobile machine learning technology for your reference. The parts enclosed by the red circles in the image below are the content I recommend that Android developers focus on mastering.

Feel free to click on “Please Share with Friends” to share today’s content with your friends and invite them to learn together. I have also prepared a generous “Study Boost Gift” for students who think critically and share actively. I look forward to learning and progressing together with you.