24 Performance Pressure Testing Incomplete Pressure Testing Cuts Results in Half

24 Performance Pressure Testing Incomplete Pressure Testing Cuts Results in Half #

Hello, I am Xu Changlong.

Previously, we discussed many ideas and designs for high-concurrency system transformation.

High-concurrency systems are complex, so optimizing concurrency for such systems is quite challenging. Many optimizations at the local level may not actually improve the overall effectiveness of the system and may even have unintended consequences, making the service unstable.

In this case, performance testing becomes even more important. Usually, through performance testing, we can do a lot of things, such as confirming the performance of individual interfaces, single servers, service clusters, or even the entire data center, which helps us identify where the bottlenecks in the service system are. Furthermore, based on the results of performance testing, we can gain clearer insights into how many simultaneous users the system can handle, providing a basis for decision-making on setting limits.

In this lesson, we will specifically discuss the key factors to consider in performance testing.

Load Testing and Its Connection with Architecture #

In load testing, one common pitfall is blindly relying on QPS (Queries Per Second) results and mistakenly assuming that “high-concurrency interfaces are equivalent to a stable system.” However, this neglects the shape of the system’s business architecture.

Therefore, before discussing load testing, we need to first understand some knowledge related to performance and business architecture. This will allow us to have a clearer perspective in load testing.

Parallel Optimization #

As mentioned earlier, we cannot blindly trust QPS results when optimizing as we need to analyze comprehensively. To help you understand this point, let’s look at an example.

In common business scenarios, multiple dependent services are requested to process data sequentially, resulting in “serial blocking wait” situations. When a service requests too many other services, the interface’s response speed and QPS will deteriorate.

You can refer to the following diagram to understand this process:

To improve performance, some businesses optimize the dependent resources by making parallel requests to these resources, which can improve the interface’s response speed. Take a look at the implementation shown in the following diagram:

As shown in the above diagram, when a business requests dependent interfaces, it no longer waits sequentially for their processing. Instead, it concurrently sends requests to obtain all results, processes the business logic in parallel, and finally merges the results to return to the client. This design significantly improves the interface’s response speed, especially for services that depend on multiple resources.

However, there is a side effect to this optimization. It increases the pressure on internal network-dependent services, causing them to receive more instant concurrent requests. If we extensively use this technique, it can lead to an amplification of requests in the internal network. For example, if the external network has a QPS of 10,000, the internal network traffic may magnify to 100,000 QPS. This excessive internal network pressure can cause the overall website service to become unstable.

Therefore, parallel requesting dependent resources is not a panacea. We need to pay attention to the capacity of the dependent services. This technique is more suitable for systems with high read-to-write ratios. For many complex internal network services, especially those with transactional consistency, if the concurrency is high, these services may fail to respond due to lock competition and timeouts.

So, for business systems with many dependencies, like the one in the previous example, what kind of load testing approach is more reasonable? My suggestion is to perform load tests on internal network services first. Once the upper limit of stable QPS for the internal network is determined, we can then use this information to derive the recommended QPS limit for the external network.

Temporary Cache Service #

Another situation that requires special attention in load testing is temporary cache optimization, which we mentioned earlier in Lesson 2.

Temporary caching is usually implemented as shown in the following diagram:

From the diagram, you can see that when an interface request depends on data, it first requests the cache. If the cache contains the data, it directly retrieves the data from the cache. Otherwise, it retrieves the data directly from the data source. This can accelerate the response speed of our service.

When load testing services optimized with temporary caching, you may observe that responses for requests with the same parameters are fast, and even the QPS is high. However, this does not equate to the true performance of the service. The hidden instability of the system still exists. Why is this the case?

This is because the temporary cache optimization targets interfaces that are frequently accessed and repeated. After the optimization, the first request to the interface is still slow. If a certain type of service has slow responses for existing interfaces and the requests with the same parameters are not frequent, then the cached data for such services becomes meaningless.

Therefore, this structure is not suitable for low-frequency access business scenarios. During load testing, we also need to pay attention to the performance of these interfaces in production.

Sharded Architecture #

Next, let’s take a look at sharded architecture. The following diagram shows an architecture that mitigates load pressure through sharding. We mentioned this in Lesson 18:

In a sharded architecture, services distribute requests to corresponding shards based on certain identification IDs with the expectation of load balancing the requests. However, in practice, the actual situation may not match the expectations.

Let me share an experience based on a pitfall I encountered. In online training businesses, we used class IDs as the sharding identifier. However, when there were 100,000 people engaged in online interactions, only one shard served external requests, while the other shards had minimal traffic.

This situation mainly occurred due to two reasons. First, our class IDs were few, representing a small range of data. If the hash algorithm is not sufficiently dispersed, the data will be assigned to a single shard. Second, because there are many hash algorithms with different levels of dispersion, the calculation results may vary. Thus, some specific data characteristics may not be well dispersed, requiring us to test and choose the hash algorithm carefully.

To prevent similar problems, I recommend validating with actual production data during load testing. If there are always single hot shards, it may be necessary to consider changing the hash algorithm. After performing this validation, don’t forget to perform load testing with random data until you find the algorithm that best suits your business (note that changing the hash algorithm has significant implications, so be cautious in your selection and transition).

Data Volume #

In addition to the architecture, data volume is also an important factor affecting the effectiveness of load testing.

If the interface performs calculation services using multiple sets of data, it is necessary to consider whether the data volume will affect the interface’s QPS (Queries Per Second) and stability. If the data volume directly affects the performance of the interface, different data volumes should be validated separately during load testing.

To ensure as much realism as possible in testing and to avoid leaving hidden dangers for high-traffic services, it is recommended to use desensitized real production data when operating this type of interface during load testing.

Here, a special reminder is given: for statistical services that require real-time aggregation of large amounts of data, be cautious about providing these services to the outside world. If the service involves too much data, it is recommended to change the implementation approach to using precomputed methods.

If our core business interface must provide data statistical services, it is recommended to consider changing the approach or adding caching to prevent the core service from crashing.

Notes on Load Testing Environment #

After understanding the relationship between performance and architecture, I believe you have many clear ideas and feel ready to start load testing, right?

However, reality is not that simple. We still need to consider the differences between the load testing environment and the real environment. Before load testing, if we want our results to be more accurate, it is best to minimize the factors that may affect the tests.

During the data preparation phase before load testing, we usually need to consider the following factors:

Consistency in the load testing environment: Try to use the same set of servers and configuration environment to verify the optimization effect.
Avoiding cache interference: It is recommended to wait for a certain period of time after each load test to allow services and caches to expire before conducting the test. This is the only way to verify the accuracy of the results.
Consistency in data status: We should try our best to ensure that the data volume, number of load test users, and cache status used by the service are consistent.

Next, let’s take a look at other considerations when setting up the load testing environment.

I have found that many people try to perform load testing on their local development computers, but in many cases, this approach won’t yield valid results. It is recommended to prepare several servers to send load testing requests and set up additional business servers to receive these requests. Only then can the load testing be closer to the actual operation of the business.

In addition, we should not overlook the configuration of the Linux environment. Commonly used options for optimizing the Linux kernel configuration include the limit on the number of local available ports, the limit on file handles, the timeout for long connections, the load balancing of network soft interrupts, and IO cache sizes. These options can all affect the performance of our servers. It is recommended to optimize them before conducting formal load testing. I mention this here because I have encountered similar issues before.

During one load test, we found that no matter how we tested the business, we could not exceed 10,000 QPS. To investigate this, we wrote an interface that did not execute any logic and simply returned text, and then performed a benchmark load test on this interface. We found that the performance still did not reach 10,000 QPS. Finally, after upgrading and improving all the Linux configurations, we were able to solve this problem.

Online Load Testing and Shadow Database #

Although online load testing is more realistic, it can generate a large amount of garbage data in a short period of time. This includes a large number of logs, useless test data, and forged business data. It may also result in a large backlog of queues, occupying server resources, and even directly causing various online failures. When the QPS (queries per second) of load testing exceeds 100,000, the “data garbage” created by one load test is equivalent to the amount of data generated by daily business operations in one month, making it very difficult to manually clean up.

Therefore, in order to ensure that the testing does not affect normal online services, I recommend using a shadow database to conduct load testing. This method involves including a special header in the load testing requests, so that all data read and write requests will be redirected to the testing database instead of the production database. With a shadow database, we can effectively reduce the risk of contaminating the business data.

Full Link Load Testing and Traffic Playback Testing #

Previous discussions on load testing focused on testing individual interfaces and services. However, in practical scenarios, the most common problem encountered is that the performance appears to be good during the load testing of individual interfaces, but the system crashes before reaching the estimated production traffic.

The reason for this problem lies in the fact that our services are not completely independent. Often, hundreds of interfaces share a set of databases, caches, and queues. Therefore, the capability of the system services needs to be comprehensively evaluated.

For example, if you optimize interface A, but this process requires calling interfaces A, B, and C, and interfaces B and C have slow performance or consume significant system resources. In this case, even if the load testing conditions for interface A are good, the overall performance of the service process will still not improve. Similarly, if a business consumes too many shared resources, it will affect the performance of other shared resource services. Therefore, after performing performance testing for individual interfaces, it is recommended to conduct full-link load testing.

Both of the situations mentioned above can be resolved by conducting full-link load testing. This approach helps simulate various complex usage scenarios and provides a more comprehensive evaluation of system operation, thereby identifying performance bottlenecks.

How do we simulate “complex usage scenarios”? It is recommended that you design multiple main business scenarios to run in parallel processes. For example, one group of virtual users browses and searches for products, another group of virtual users places orders and makes payments, and yet another group of virtual users performs common functions in the background.

The performance data obtained from this approach can serve as an upper limit for the load the online service can handle during peak times. If the core interfaces of a specific process experience high loads and slow responses, it will slow down the overall process efficiency. Therefore, we can identify bottlenecks and potential vulnerabilities by observing the QPS (Queries Per Second) of the entire process.

If the service metrics remain stable during the load testing for a certain period, we can increase the number of threads for a single process to try to overload the system and observe if the defect detection and warning systems respond in a timely manner. However, it is important to be prepared to repair the database in such cases.

If the business is relatively complex and manual script writing for load testing is difficult, another approach is to playback real user requests from the online environment for load testing. This approach can also be used to reproduce specific request scenarios for certain types of faults.

Specifically, you can use the tcpcopy tool to record online traffic requests. After generating the request record file, you can simulate and build a full data mirror that replicates the online data during the recording process, and then conduct playback testing.

However, using this tool can be challenging, so it is recommended to use it in conjunction with a mature load testing platform tool. Additionally, you will need a dedicated bypass server for load testing or recording. Note that care should be taken not to request payment-related services from the production environment, as it could lead to financial losses for users.

Summary #

Performance testing is an essential tool for verifying the effectiveness of our service transformation, capacity evaluation, architectural rationality, and disaster drills.

Through performance testing, we can have a clearer understanding of the operation and load-bearing capacity of the service, and comprehensively analyze the performance bottleneck. Every time there is a change in the business or optimization is made, the effectiveness of the optimization can be evaluated through performance testing.

I would like to emphasize that the QPS (Queries Per Second) of the load test does not necessarily reflect whether our optimization is reasonable, and this needs to be evaluated comprehensively in combination with the business architecture.

Let’s review several typical examples mentioned in the course:

Optimizing the parallel requests dependent service into serial requests can improve the response speed of the interface, but it will increase the pressure on the intranet;
Temporary cache service can reduce the pressure of repetitive queries on the intranet, but if it is low-frequency data access, the optimization effect is not very significant;
When load testing a sharded architecture service, attention should be paid to the problem of hotspots in individual shards. Otherwise, although the load test may perform well, there may be problems in actual operation.
Interfaces that are greatly influenced by the amount of data involved in calculations need to pay special attention to testing in real system environments and with extreme data volumes.

In addition to verifying the parallel requests, temporary cache, sharded architecture, and data volume mentioned above, it is also recommended to perform some extreme tests to evaluate the stability of the service. For interfaces with a large amount of data, during load testing, attention should be paid to the pressure on related databases, the hit rate of caches and indexes, in order to prevent problems such as excessive database pressure and slow response.

In addition, we need to perform online environment load testing when there are fewer users, but we need to prevent the generation of garbage data during the load test. This can be solved using a shadow library approach, but it requires the cooperation of all business units and needs to be coordinated and confirmed in advance.

Finally, compared to load testing a single interface, in order to simulate the real situation online as much as possible, I introduced you to two more comprehensive load testing methods, which are end-to-end testing and traffic playback testing.

Thought-provoking Question #

How can we ensure that unit tests before deployment do not contaminate the production environment with test data?

I look forward to interacting with you in the comment section, and I also recommend that you share this lesson with more colleagues and friends.