28 How to Easily Handle High Traffic With Sae Elastic Capabilities Through Stress Testing Tools

28 How to Easily Handle High Traffic with SAE Elastic Capabilities through Stress Testing Tools #

Challenges in Traditional Big Promotions #

image.png

In a typical big promtion event, technical personnel usually start with the following preparations:

  • Architecture Review: A systematic review of the services involved in the promotion.
  • Capacity Planning: Combined with the architecture review, determine the system’s SLA indicators, form a capacity model, and help evaluate the business.
  • Performance Testing: Evaluate the single-node capacity of the core system and perform end-to-end load testing of the core links to validate the capacity model and identify system issues.
  • Application/Database Optimization: Optimize system issues, such as hotspots, deadlocks, or slow SQL, to ensure that the system can support the promotion.
  • Scaling Plan Preparation: Based on capacity planning and performance testing, a scaling plan that meets the requirements of the event can be determined to ensure business continuity and reduce costs.
  • Emergency Preparedness: Prepare for unexpected situations, such as business degradation, removing non-core logic, or flow control to ensure the stability of core links.
  • Online Emergency Support during Promotions: Dedicated personnel respond to problems and execute emergency plans.

To complete the above preparations, the following pain points are often encountered:

  • Lack of global relationship perspective on the core links of the system. Requires a lot of time to organize dependencies.
  • Time-consuming to diagnose problems on the upstream and downstream links. Aggregating problems on the links and diagnosing them during load testing and online emergency support is time-consuming, and there is a lack of tools for fast identification and analysis.
  • Fast business development iteration requiring normalized load testing support. A large amount of repetitive manpower is invested, which puts a great burden on the team.
  • High cost of reserving resources requiring frequent scaling up and down. Requires productized support for automatic scaling to reduce high fixed investments such as self-built data centers.

SAE Big Promotion Solution #

image.png

Firstly, SAE is an application-oriented Serverless PaaS platform. In addition to the traditional PaaS features, it provides complete end-to-end monitoring, microservice management, and leverages Serverless capabilities to maximize rapid scaling and reduce manual operation costs.

image.png

The solution provided by SAE will start from three aspects:

  • Metrics Visualization: Utilize the application monitoring provided by ARMS to offer rich features such as JVM, end-to-end tracing, and slow SQL, etc., to conveniently evaluate water levels and locate problems.
  • Application High Availability (AHAS): Use AHAS flow control and degradation capabilities to protect core services and ensure availability doesn’t drop to zero even with a surge of traffic.
  • Performance Testing: Use performance testing tools like PTS to simulate single-node or end-to-end load testing to validate capacity planning and discover application issues.

Quick Load Testing and Verification #

So how can we quickly load test and verify a big promotion using SAE? The following steps will include a complete demonstration:

Step 1: Observe application monitoring metrics and roughly plan elasticity/load testing/flow control and degradation #

image.png

By observing application monitoring, you can get a rough concept of daily business monitoring metrics. Taking a typical e-commerce application as an example:

From the monitoring situation:

  • The application is an HTTP microservice application.
  • The application relies heavily on HTTP microservice calls, with a small number of Redis/MySQL services being used. It is suitable for single-machine and distributed load testing tools to perform load testing separately.
  • The QPS metric is more sensitive to business than CPU, MEM, and RT metrics, making it more suitable as an elastic strategy metric.

Step 2: Select a suitable load testing tool #

image.png

According to business requirements, you can choose a quick and easy-to-use tool or a feature-complete load testing tool.

  • For example, for single-machine HTTP load testing, tools like ab and wrk can provide simple and fast load testing methods, but they only support single machines and don’t support context.
  • If you need to support WebSocket and normalized load testing, the cloud product PTS can provide more comprehensive services at a lower cost compared to building it yourself.

Step 3: Configure SAE elastic scaling strategy + AHAS flow control and degradation strategy #

image.png

No need for precise settings, select some suitable metrics to configure the SAE elastic scaling strategy, and additionally configure AHAS flow control strategy / ARMS alarms.

  • For API types, you can use API QPS, SQL QPS, and other metrics for flow control to quickly failover requests that exceed system water levels, reducing the SLA of business inside the capacity. You can also configure elastic rules by selecting application monitoring metrics such as QPS and RT to let the system scale elastically.
  • For compute-oriented applications, you can choose more sensitive metrics such as CPU and memory to perform scaling up and down.

Step 4: Execute load testing – Observe results – Optimize code – Adjust strategy configuration #

image.png

  1. Based on the load testing and monitoring results, determine whether it is necessary to optimize the code or adjust the SAE elastic scaling strategy and AHAS flow control strategy. 2) Perform load testing and view the load testing results to find failed requests. 3) Check for abnormal monitoring metrics and find GC exceptions. Optimize JVM parameters through the SAE console. 4) Perform load testing again to verify if the problem is resolved. 5) Repeat this process one or two times, solving the main problems discovered to face big promotions more comfortably.

Please click 【Video Course Link】 to watch the detailed demonstration process.