03 System Design Goals I How to Improve System Performance

03 System Design Goals I - How to Improve System Performance #

When it comes to internet system design, you may often hear the term “three highs,” which refers to “high concurrency,” “high performance,” and “high availability.” They are the eternal themes of internet system architecture design. In the previous two lessons, I introduced to you the meaning, significance, and principles of layered design in high concurrency system design. Now, I would like to give you an overall understanding of the goals of high concurrency system design. Based on that, we will delve into today’s topic: how to improve system performance.

Three Goals of High Concurrency System Design: High Performance, High Availability, and Scalability #

High concurrency refers to the use of design techniques to enable a system to handle more concurrent user requests, i.e., to handle larger traffic loads. It is the background and prerequisite for all architectural designs. Without high concurrency, talking about performance and availability is meaningless. Obviously, achieving millisecond response times and five nines (99.999%) availability in scenarios of one request per second and ten thousand requests per second, respectively, are not in the same league in terms of design difficulty and solution complexity.

Performance and availability, on the other hand, are the factors that we must consider when implementing high concurrency system design.

Performance reflects the user experience of a system. Imagine two systems each handling ten thousand requests per second. One system has millisecond response times, while the other has response times in seconds. The user experience provided by these two systems will certainly be different.

Availability represents the time when a system can serve users normally. Let’s make another comparison. Again, we have two systems handling ten thousand requests per second. One system can operate without interruptions or faults all year round, while the other experiences occasional downtime for maintenance. As a user, which system would you choose? The answer is obvious.

Another well-known term is “scalability”, which is also a factor to consider in high concurrency system design. Why? Let me provide a specific example.

Traffic can be categorized as normal traffic and peak traffic. Peak traffic may be several times or even tens of times higher than normal traffic. When dealing with peak traffic, we usually need to make more preparations in terms of architecture and solutions. For example, Taobao spends half a year preparing for Double Eleven, and even the seemingly flawless Weibo system can become unavailable in the face of hot events like “celebrity divorces”. A system that is easy to scale can quickly expand its capacity in a short period of time and handle peak traffic more smoothly.

High performance, high availability, and scalability are the three goals we pursue when designing high concurrency systems. In the next three lessons, I will explain how to design high-performance, highly available, and scalable systems under high concurrency and heavy traffic.

After understanding these concepts, let’s formally move on to today’s topic: how to improve system performance?

Performance Optimization Principles #

“The world’s martial arts, only fast without break.” Performance is the key to the success of system design, and achieving high performance is also a challenge to the individual capabilities of programmers. However, before we understand the methods to achieve high performance, let’s first clarify the principles of performance optimization.

First of all, performance optimization must not be blind, it must be problem-oriented. Without the problem, blindly optimizing prematurely will increase the complexity of the system, waste the time of developers, and because some optimizations may involve trade-offs in business considerations, it may also harm the business.

Secondly, performance optimization also follows the “80/20 rule”, which means you can solve 80% of the performance problems with 20% of the effort. Therefore, in the optimization process, we must grasp the main contradictions and prioritize optimizing the main performance bottlenecks.

Furthermore, performance optimization must have data support. During the optimization process, you should always be aware of how much the response time has decreased and how much throughput has been improved by your optimizations.

Finally, the process of performance optimization is continuous. High-concurrency systems are usually systems with relatively complex business logic, so the performance problems that occur in these systems usually have multiple causes. Therefore, when we do performance optimization, we need to define clear goals, for example, supporting a throughput of 10,000 requests per second with a response time of 10ms. We need to continuously search for performance bottlenecks, establish optimization plans, until the goal is achieved.

With the guidance of these four principles, mastering the methods of troubleshooting common performance issues and optimization techniques will definitely make you more proficient in designing high-concurrency systems.

Performance Metrics #

The third principle mentioned in performance optimization states that we need to have measurement standards for performance. By having data, we can clearly identify existing performance issues and evaluate the effectiveness of performance optimization. Therefore, it is crucial to have clear performance metrics.

Generally speaking, the metric for measuring performance is the response time of the system interface. However, the response time of a single request is meaningless. You need to know what the performance is like over a period of time. Therefore, we need to collect the response time data for this period of time and calculate certain statistical indicators to represent the performance during that time. The following are some common statistical indicators:

Average

As the name suggests, the average is obtained by summing up the response time data of all requests during a period of time and dividing it by the total number of requests. The average can reflect the performance of that period to some extent. However, its sensitivity is poor. If there are a few slow requests during that time, the average may not reflect the situation accurately.

For example, let’s say we have 10,000 requests in 30 seconds, and each request has a response time of 1ms. In this case, the average response time for that period is also 1ms. Now, if 100 of these requests have a response time of 100ms instead, the overall response time becomes (100 * 100 + 9900 * 1) / 10000 = 1.99ms. As you can see, although the average increased by less than 1ms, the reality is that the response time of 1% of the requests (100/10000) has increased 100 times. Therefore, the average can only serve as a reference for measuring performance.

Maximum

This one is easier to understand. It is the longest response time among all requests during that period of time. However, its problem lies in being too sensitive.

Using the example above, if only one of the 10,000 requests has a response time of 100ms, the maximum response time for that period would be 100ms. This would imply a performance deterioration of only 1%, which is clearly inaccurate.

Percentile

Percentiles come in many forms, such as the 90th percentile, 95th percentile, and 75th percentile. Let’s take the 90th percentile as an example. We sort the response time of all requests during that time period in ascending order. If there are 100 requests in total, then the response time at the 90th position would be the 90th percentile value. Percentiles exclude the impact of occasional extremely slow requests on the data, and they can accurately reflect the performance of that period. The larger the percentile value, the more sensitive it is to slow requests.

In my opinion, percentiles are the most suitable statistical indicators for measuring response time over a period of time, and they are also the most commonly used in practice. Apart from that, the average can also be used as a reference value.

I mentioned earlier that it is meaningless to discuss performance without considering concurrency. We usually use throughput or simultaneous online users to measure concurrency and traffic. Throughput is more commonly used. However, you should know that these two indicators are inversely related.

This is easy to understand. When the response time is 1s, the throughput is 1 request per second. When the response time is reduced to 10ms, the throughput increases to 100 requests per second. Therefore, when measuring performance, we generally consider both throughput and response time. For example, when setting the goal for performance optimization, we usually state it like this: With a request volume of 10,000 per second, the 99th percentile response time should be below 10ms.

Now that you understand the performance metrics, let’s take a look at the strategies for achieving high performance as concurrency increases.

Performance Optimization for High Concurrency #

Suppose you have a system in which there is only one processing core, and the response time for each task is 10ms, with a throughput of 100 times per second. So how do we optimize performance to improve the system’s concurrency capability? There are two main approaches: increasing the number of processing cores in the system, and reducing the response time for individual tasks.

1. Increasing the number of processing cores

Increasing the number of processing cores in the system means increasing its parallel processing capability. This is the simplest way to optimize performance. Taking the previous example, you can increase the number of processing cores to two and add a process to run each core on a different core. In theory, this would double the system’s throughput. However, in this case, throughput and response time are not inversely proportional, but rather: throughput = number of concurrent processes / response time.

Amdahl’s law in the field of computer science, proposed by Gene Amdahl in 1967, describes the relationship between the number of concurrent processes and response time. It represents the speedup of parallel computing under a fixed load, which can be expressed by the following formula:

(Ws + Wp) / (Ws + Wp/s)

Here, Ws represents the amount of serial computation in the task, Wp represents the amount of parallel computation in the task, and s represents the number of parallel processes. From this formula, we can derive another formula:

1/(1-p+p/s)

Here, s still represents the number of parallel processes, and p represents the proportion of parallelism in the task. When p is 1, which means fully parallel, the speedup is equal to the number of parallel processes. When p is 0, which means fully serial, the speedup is 1, meaning there is no improvement. When s approaches infinity, the speedup is equal to 1/(1-p), and as you can see, it is directly proportional to p. Especially when p is 1, the speedup tends to infinity.

The derivation process of these formulas is somewhat complex, but you only need to remember their conclusions.

It seems that we have found a silver bullet for solving the problem. Can we increase the number of processing cores without limit to improve performance and enhance the system’s ability to process high concurrency? Unfortunately, as the number of concurrent processes increases, the contention for system resources by parallel tasks becomes more severe. Continuing to increase the number of concurrent processes at a certain critical point will actually reduce the system’s performance. This is the inflection point model in performance testing.

From the graph, you can see that when the number of concurrent users is in the light load region, the response time remains stable, and throughput is linearly related to the number of concurrent users. When the number of concurrent users is in the heavy load region, system resource utilization reaches its limit, and throughput begins to decline, while the response time also slightly increases. At this point, if you increase the pressure on the system, it enters the inflection point, becomes overloaded, and throughput decreases while response time increases significantly.

Therefore, when evaluating system performance, we usually need to conduct stress testing in order to find the “inflection point” of the system, understand its capacity, and identify performance bottlenecks for continuous optimization.

After discussing the improvement of parallel capabilities, let’s take a look at another way to optimize performance: reducing the response time for individual tasks.

2. Reducing the response time for individual tasks

To reduce the response time of tasks, you first need to determine whether your system is CPU-bound or IO-bound, as the methods for optimizing different types of systems vary.

In CPU-bound systems, a large number of CPU computations need to be processed. In this case, using more efficient algorithms or reducing the number of computations is the key optimization method for these systems. For example, if the main task of the system is to calculate hash values, using a higher-performance hash algorithm can greatly improve the system’s performance. Profiling tools can be used to identify the methods or modules that consume the most CPU time, such as Linux’s perf and eBPF.

IO-bound systems refer to systems in which most operations are waiting for IO to complete, where IO specifically refers to disk IO and network IO. Most of the familiar systems fall into this category, such as database systems, caching systems, and web systems. The performance bottleneck of these systems may be within the system itself or may rely on other systems. There are two main approaches to identify performance bottlenecks of this kind.

The first approach is to use tools. Linux provides a rich set of tools that can meet your optimization needs, such as network protocol stack, network cards, disks, file systems, memory, etc. These tools have multiple usage methods, which you can gradually accumulate during the troubleshooting process. In addition, some development languages have analysis tools tailored to their specific features. For example, the Java language has its own memory analysis tool.

The other approach is to use monitoring to identify performance issues. In monitoring, we can conduct time-sharing statistics on each step of a task to determine which step consumes more time. This part will be specifically covered in the evolution section, so it will not be elaborated here.

Once we have identified the bottleneck of the system, how do we optimize it? The optimization solutions vary depending on the specific problem. For example, if the problem is slow database access, you need to check if there are any table locks, full table scans, appropriate indexing, JOIN operations, or if caching is needed, etc. If it is a network problem, you need to check if there is room for optimizing network parameters, look at packet captures to see if there are a large number of timeouts and retransmissions, if there are a large number of packet losses on the network card, etc.

In conclusion, “different strokes for different folks.” We need to develop different performance optimization solutions to deal with different performance problems.

Course Summary #

Today, I took you through the principles of performance, measurement indicators, and the basic approach to optimizing performance under high concurrency. Performance optimization is a vast topic, and one lecture is not enough, so I will go into more detail on certain aspects in future courses. For example, I will explain how to optimize system read performance using caching and how to optimize write performance using message queues.

Sometimes, when you encounter performance issues, you may feel helpless. From today’s course, you can gain some insights. Here are a few key points:

Prioritize data. Before launching a new system, make sure to have a performance monitoring system in place.
Master some performance optimization tools and methods. This requires continuous accumulation of experience in your work.
Fundamental knowledge of computer science is crucial. For example, understanding networking and operating systems will help you identify key performance issues and navigate the optimization process with ease.