57 How Tos Quick Reference for Linux Performance Tools

57 How-tos Quick Reference for Linux Performance Tools #

Hello, I’m Ni Pengfei.

In the previous section, I took you through a review of common performance optimization ideas. Let’s briefly recap.

We can conduct performance optimization from two perspectives: the system and the application.

  • From the system’s perspective, the focus is on optimizing the CPU, memory, network, disk I/O, and kernel software resources, among others.

  • From the application’s perspective, the main goal is to simplify the code, reduce CPU usage, minimize network requests and disk I/O, and improve the application’s throughput by utilizing caching, asynchronous processing, multiprocessing, and multithreading.

It is best to gradually refine and dynamically improve performance optimization. Instead of aiming for a one-shot solution, the priority should be ensuring that the current performance requirements are met. Performance optimization usually means increased complexity and decreased maintainability.

If you find that optimizing performance on a single machine brings excessive complexity, do not become obsessed with pushing the limits of a single machine’s performance. Instead, consider improving performance by horizontally scaling from a software architecture perspective.

To do a good job, one must first sharpen one’s tools. We know that when it comes to performance analysis and optimization, using suitable performance tools can greatly enhance the entire process. Do you remember any commonly used performance tools? Today, I’ll go through some commonly used performance tools with you, so that you can quickly find what you need when necessary.

Performance Tool Quick Reference #

Before we delve into performance tools, let me first ask you a question: under what circumstances do we need to search for and select performance tools? You can think about it first before continuing with the content below.

In my opinion, we only think about “if there was a performance tool quick reference chart” when we want to understand a performance metric but do not know what to do. If we know that a performance tool is available, we will more likely refer to the tool’s manual to find out its functions, usage, and precautions.

Regarding the viewing of tool manuals, “man” is probably the most familiar method to us, as I have mentioned multiple times in this column. In fact, besides “man”, there is another command for querying manuals, which is “info”.

“info” can be understood as a more detailed version of “man” and provides more powerful features such as node jumping. Relatively speaking, the output of “man” is more concise, while the output of “info” is more detailed. Therefore, we usually use “man” to query the usage of a tool, and only refer to the info document when the output of “man” is not easy to understand.

Of course, as I mentioned, in order to query a manual, it is a prerequisite to know which tool is available. If you don’t know which tool to use, you need to search for available tools based on the metrics you want to understand. Among them:

  • Some tools can be used directly without the need for additional installation, such as the kernel’s “/proc” file system.

  • And some tools require the installation of additional software packages, such as “sar”, “pidstat”, “iostat”, etc.

Therefore, when choosing performance tools, we need to consider both the performance metric and the analysis environment. For example, whether the actual environment allows the installation of software packages, whether a new kernel version is needed, etc.

After understanding the basic principles of tool selection, let’s take a look at Linux performance tools. First, let me recommend the following chart, which is Brendan Gregg’s performance tool chart. I have mentioned it multiple times in this column, and you have probably already referred to it. -- (Image from brendangregg.com)

This chart starts from various subsystems of the Linux kernel and summarizes the available tools for analyzing the performance of each subsystem. However, although this chart is one of the best references for performance analysis, it is not detailed enough.

For example, when you need to view a specific performance metric, there may be multiple performance tools available in the corresponding subsystem part of this chart. However, not all of these tools are applicable, and you need to refer to the manual of each tool, compare and analyze them to make a choice.

So, is there a better way to understand these tools? My suggestion is to start from performance metrics and categorize the performance tools into different types based on different metrics. For example, the most common way is to categorize these tools based on various performance metrics related to CPU, memory, disk I/O, and network.

Next, I will sort out these common Linux performance tools from the perspectives of CPU, memory, disk I/O, and network, particularly from the perspective of performance metrics, to clarify which tools are available for monitoring specific performance metrics. These tools are actually used throughout various cases in our column. To facilitate your viewing, I have organized them into tables and added usage scenarios for each tool.

CPU Performance Tools #

First of all, from the perspective of the CPU, the main performance indicators are CPU utilization, context switches, and CPU cache hit rates. The following figure lists common CPU performance indicators.- CPU Performance Indicators- Based on these indicators, we can further divide CPU utilization into system and process dimensions. Here is a quick reference table for CPU performance tools. Note that because each performance indicator may correspond to multiple tools, I have summarized the characteristics and precautions of these tools in the description of each indicator. These are also areas that you need to pay special attention to.- CPU Performance Tools Quick Reference

Memory Performance Tools #

Next, let’s look at memory performance. From the perspective of memory, the main performance indicators are the allocation and usage of system memory, the allocation and usage of process memory, and the usage of SWAP. The following diagram lists common memory performance indicators.

From these indicators, we can obtain the quick reference table for memory performance tools shown in the table below. Just like with CPU performance tools, I have organized the characteristics and precautions of common tools for you.

Note: The source code link for pcstat in the last row is https://github.com/tobert/pcstat

Disk I/O Performance Tools #

Next, from the perspective of file systems and disk I/O, the main performance indicators are the utilization of the file system, the usage of cache and buffers, as well as the utilization, throughput, and latency of disk I/O. The following image lists common I/O performance indicators. - - Starting from these indicators, we can obtain the following cheat sheet for file systems and disk I/O performance tools. Just like CPU and memory performance tools, I have also summarized the features and precautions of these tools. -

Network Performance Tools #

Finally, from the perspective of the network, the main performance indicators are throughput, response time, number of connections, packet loss, etc. According to the principles of the TCP/IP network protocol stack, we can further refine these performance indicators into specific indicators for each layer of the protocol. Here, I have also used a figure to list the main indicators for each layer from the link layer, network layer, transport layer, and application layer.

Network Layers Indicators

Based on these indicators, we can obtain the following quick reference table for network performance tools. Similarly, I have also summarized the characteristics and precautions of various tools for you.

Network Performance Tools Reference

Benchmarking Tools #

In addition to performance analysis, we often need to do benchmarking of system performance. For instance,

  • In the file system and disk I/O modules, we use the fio tool to test the performance of disk I/O.

  • In the network module, we use tools such as iperf and pktgen to test network performance.

  • And in many cases based on Nginx, we use tools such as ab and wrk to test the performance of the Nginx application.

In addition to these tools mentioned in this column, there are many other benchmarking tools that may be used for various subsystems of Linux. The following image is a Linux benchmarking tool map compiled by Brendan Gregg. You can save it for reference when needed.

Benchmarking Tools Map (Image source: brendangregg.com)

Conclusion #

Today, we have summarized common performance tools and compiled a quick reference table for various performance indicators from different perspectives such as CPU, memory, filesystem and disk I/O, network, and benchmarking.

When analyzing performance issues, there are generally two steps:

  • The first step is to identify the performance indicators for analysis based on performance bottlenecks and the operating principles of the system and applications.

  • The second step is to select the most appropriate performance tools based on these indicators, and then understand and use the tools to quickly observe the required performance data.

Although there are many performance indicators and tools in Linux, once you are familiar with the meanings of these indicators, you will naturally discover the correlations between these tools and performance indicators. Following this line of thinking, it is not difficult to master the selection of these tools.

However, as we have emphasized in our column, performance tools should not be seen as the entirety of performance analysis and optimization.

  • On the one hand, the core of performance analysis and optimization lies in understanding the operating principles of the system and applications, and performance tools only assist you in completing this process more quickly.

  • On the other hand, a comprehensive monitoring system can provide most of the benchmark data needed for performance analysis. From this data, you can likely roughly locate performance bottlenecks, and therefore do not need to manually execute various tools.

Reflection #

Finally, I would like to invite you to discuss the performance tools that you have used. How do you usually choose performance tools? And how did you come up with the idea to use these performance tools to troubleshoot and analyze performance issues? You can summarize your thoughts based on my narrative.

Feel free to discuss with me in the comments section, and also feel free to share this article with your colleagues and friends. Let’s practice in real scenarios and progress through communication.