56 How Tos General Methods for Optimizing Performance Issues

56 How-tos General Methods for Optimizing Performance Issues #

Hello, I’m Ni Pengfei.

In the previous section, I guided you through the general steps for analyzing performance issues. Let’s review them briefly.

We can analyze the root cause of performance issues from two angles: system resource bottlenecks and application bottlenecks.

From the perspective of system resource bottlenecks, the USE method is the most effective approach. It analyzes various software and hardware resources such as CPU, memory, disk and file system I/O, network, and kernel resource constraints based on usage, saturation, and error count. I also reviewed the analysis methods for these resources in the previous sections of this column.

From the perspective of application bottlenecks, the sources of performance issues can be divided into three categories: resource bottlenecks, dependency service bottlenecks, and application bottlenecks.

The analysis approach for resource bottlenecks is the same as that for system resource bottlenecks.
For dependency service bottlenecks, a full-link tracking system can be used for quick localization.
As for application bottlenecks, they can be analyzed and located through system calls, hot functions, or application-specific metrics and logs.

Of course, although the system and the application are two different perspectives, they often complement and influence each other during actual operation.

The system is the operating environment of the application, and system bottlenecks will cause a decline in application performance.
In contrast, irrational design of the application program can also lead to bottlenecks in system resources.

The purpose of our performance analysis is to combine the principles of the application program and the operating system to identify the true causes of the problems.

Once we identify the source of performance issues, the entire optimization work is essentially halfway done because these bottlenecks indicate the direction for optimization. However, when it comes to performance optimization, what are some common methods?

Today, I will show you the general methods for performance optimization. Similar to performance analysis in the previous section, we can also approach performance optimization from the perspectives of the system and the application program.

System Optimization #

Let’s start with system optimization. In the previous section, I introduced the USE method for analyzing the bottlenecks of system hardware and software resources. Therefore, the corresponding optimization methods should also start from these resource bottlenecks.

In fact, the first four modules of our column already include optimization methods for these common resource bottlenecks, except for the core system resource bottleneck analysis.

Next, I will review their optimization methods from four aspects: CPU performance, memory performance, disk and file system I/O performance, and network performance.

CPU Optimization #

Let’s first look at the optimization methods for CPU performance. In the article Several Ideas for CPU Performance Optimization, I have mentioned that the core of CPU performance optimization lies in eliminating unnecessary work, fully utilizing CPU cache, and reducing the impact of process scheduling on performance.

From these perspectives, I believe you have already come up with many optimization methods. Here, I mainly emphasize the three most typical optimization methods.

The first method is to bind processes to one or more CPUs to fully utilize the locality of CPU cache and reduce the mutual interference between processes.
The second method is to enable multi-CPU load balancing for interrupt handlers, so that when a large number of interrupts occur, the advantages of multiple CPUs can be fully utilized to share the load.
The third method is to set resource limits for processes using methods such as Cgroups to prevent individual processes from consuming too much CPU. At the same time, higher priority should be set for core applications to reduce the impact of low-priority tasks.

Memory Optimization #

After discussing CPU performance optimization, let’s take a look at how to optimize memory performance. In the article How to Quickly Identify System Memory Issues, I have summarized some common memory issues for you, such as insufficient available memory, memory leaks, excessive swapping, excessive page faults, and excessive caches, etc. Therefore, memory performance optimization means solving these memory usage issues.

In my opinion, you can optimize memory performance through the following methods.

The first method is to disable swap unless necessary. This can avoid additional I/O caused by swapping, which can slow down memory access.
The second method is to set memory limits for processes using methods such as Cgroups. This can prevent individual processes from consuming too much memory, which in turn affects other processes. For core applications, the oom_score should also be lowered to avoid being killed by OOM.
The third method is to use techniques such as large pages and memory pooling to reduce dynamic memory allocation and thus reduce page faults.

Disk and File System I/O Optimization #

Next, let’s look at the optimization methods for disk and file system I/O, which belong to the third category of system resources. In the article Several Ideas for Disk I/O Performance Optimization, I have already summarized some common optimization ideas for you. Among them, there are three typical methods.

The first method, and also the simplest one, is to replace HDD with SSD or use RAID to improve I/O performance.
The second method is to select the most suitable I/O scheduling algorithm based on the characteristics of disk and application I/O patterns. For example, the noop scheduling algorithm is usually used for SSD and disks in virtual machines, while the deadline algorithm is more recommended for database applications.
The third method is to optimize the cache and buffer of the file system and disk, such as optimizing the flush frequency and dirty page limit, as well as the tendency of kernel to reclaim directory entry cache and inode cache, etc.

In addition to these, isolating data of different applications on different disks, optimizing file system configuration options, optimizing disk prefetching, increasing disk queue length, etc., are also commonly used optimization strategies.

Network Optimization #

The last one is network performance optimization. In the article Several Ideas for Network Performance Optimization, I have also summarized some common optimization ideas for you. These optimization methods are based on the Linux network protocol stack and optimize the working principles of each protocol layer. Here, I also emphasize the several typical network optimization methods.

First of all, from the perspective of kernel resources and network protocols, we can optimize kernel options, such as:

You can increase kernel resource quotas such as socket buffers, connection tracking tables, maximum half-open connections, maximum file descriptors, local port ranges, etc.
You can also reduce exceptional handling parameters such as TIMEOUT timeout, SYN+ACK retransmission count, Keepalive probe time, etc.
You can also enable port reuse, reverse address verification, and adjust MTU size to reduce the burden on the kernel.

These are the most common measures for optimizing kernel options.

Secondly, from the perspective of network interfaces, we can consider optimizing the functions of network interfaces. For example:

You can offload tasks that originally run on the CPU to the NIC, by enabling functions such as GRO, GSO, RSS, VXLAN, etc.
You can also enable the multi-queue function of network interfaces so that each queue can use a different interrupt number and be scheduled to execute on different CPUs.
You can also increase the buffer size and queue length of the network interface to improve network throughput.

Finally, in extreme performance scenarios (such as C10M), the kernel’s network protocol stack may be the main performance bottleneck. Therefore, bypassing the kernel protocol stack is generally considered.

You can use DPDK (Data Plane Development Kit) technology to skip the kernel protocol stack and allow user-space processes to handle network requests with polling. By combining various mechanisms such as large pages, CPU binding, memory alignment, and pipeline concurrency, you can optimize the processing efficiency of network packets.
You can also use the built-in XDP (eXpress Data Path) technology of the kernel to process network packets before they enter the kernel protocol stack. This can also achieve the goal of obtaining good performance.

Application Optimization #

After discussing the optimization of system software and hardware resources, let’s now look at the optimization strategies for application programs.

Although system software and hardware resources provide the foundation for the normal operation of application programs, you should know that the best place for performance optimization is still within the application program itself. Why do I say that? Let me give you two examples to help you understand.

The first example is the problem of high system CPU usage (sys%). Sometimes, when the problem arises, even though the apparent symptom is high system CPU usage, after analyzing it, you may find that the unreasonable system calls within the application program are the main culprits. In such cases, optimizing the logic of system calls inside the application program is obviously simpler and more effective than optimizing the kernel.

Another example is the high CPU usage and slow I/O response of databases, which is also a common performance problem. Generally speaking, this problem is not because the database itself has poor performance, but rather due to the application program’s unreasonable table structure or SQL queries. In this case, optimizing the logic of database table structure or SQL statements within the application program will obviously bring greater benefits compared to optimizing the database itself.

Therefore, when observing performance indicators, you should first check the response time, throughput, and error rate of application programs, as they are the ultimate problems that performance optimization aims to solve. By focusing on these aspects, you can come up with many optimization methods. Here are a few methods that I recommend.

First, from the perspective of CPU usage, simplifying code, optimizing algorithms, asynchronous processing, and compiler optimization are common methods to reduce CPU usage. This allows you to utilize limited CPU resources to handle more requests.
Second, from the perspective of data access, using caching, copy-on-write, increasing I/O sizes, etc., are common methods to reduce disk I/O. This enables faster data processing.
Third, from the perspective of memory management, using large pages, memory pools, and other methods can preallocate memory and reduce dynamic memory allocation, thereby improving memory access performance.
Fourth, from the perspective of networking, using I/O multiplexing, replacing short connections with long connections, DNS caching, etc., can optimize network I/O and reduce the number of network requests, thereby reducing performance issues caused by network latency.
Fifth, from the perspective of process work models, asynchronous processing, multi-threading, or multi-processing can fully utilize the processing capabilities of each CPU, thereby improving the throughput of application programs.

In addition to these methods, you can also use various techniques such as message queues, content delivery networks (CDN), and load balancing to optimize the architecture of application programs. By distributing the tasks that were originally handled by a single machine to multiple servers for parallel processing, you often achieve better overall performance.

Summary #

Today, I took you through the common methods of performance optimization from the perspectives of system and application programs.

From the system perspective, various hardware and software resources such as CPU, memory, disk and file system I/O, network, and kernel data structures provide an operating environment for application programs and are the key objects of our performance optimization. You can refer to the optimization sections in the first four modules of our column to optimize these resources.

From the application program perspective, common methods of performance optimization include reducing CPU usage, minimizing data access and network I/O, using caching, asynchronous processing, and multi-process and multi-threading. In addition to these single-machine optimization methods, adjusting the architecture of the application program or utilizing horizontal scaling to distribute tasks to multiple servers for parallel processing are also common optimization approaches.

Although there are many methods of performance optimization, I still want to emphasize that early optimization should be avoided. Performance optimization often increases complexity, which not only reduces maintainability but also poses obstacles for adapting to complex and changing new requirements.

Therefore, it is best to gradually improve performance optimization and conduct it dynamically. Rather than pursuing immediate perfection, the first priority should be to ensure that current performance requirements are met. After identifying performance deficiencies or bottlenecks, choose the most significant performance issue to optimize based on performance analysis results.

Reflection #

Finally, I would like to invite you to talk about how you optimize when encountering performance issues. Do you have any memorable experiences you can share with me? You can summarize your own thoughts based on my discussion.

Feel free to discuss with me in the comments section. You are also welcome to share this article with your colleagues and friends. Let’s practice in real scenarios and improve through communication.