33 How to Design a Caching System for Instant Killing Systems

33 How to Design a Caching System for Instant Killing Systems #

In this lesson, we will specifically discuss how to design a cache system for a spike system.

Analysis of Seckill System #

In order to attract attention, Internet e-commerce platforms often hold low-cost seckill sales for certain products. For example, a few years ago, Xiaomi released new products sporadically, and currently, there are regular sales of discounted products during events like Double 11 and Double 12. During the seckill sales, a large number of consumers flock to the platforms, bringing a great deal of popularity, as well as an extremely high concurrent access load to the service systems behind the platforms.

While different e-commerce platforms and seckill events may have varying products and sales strategies, there are common characteristics shared by the seckill systems that underpin these events.

First, the seckill business is simple. The products for each seckill event are pre-defined, with clear types and quantities, and sales end once they are sold out.

Second, seckill events are scheduled and there is a designated seckill entrance. Consumers can access this entrance after the event starts and make purchases during the seckill.

Third, due to the low prices and widespread promotion of seckill products, the number of buyers far exceeds the available quantity. As a result, the products are quickly sold out once the sales start.

Finally, due to the large number of participants in seckill events, far exceeding the number of daily visitors, a significant number of consumers flood into the seckill system and keep refreshing the pages, generating a high level of concurrent traffic within a short period of time. This lasts until the end of the event, when the traffic disappears.

Analyzing the characteristics of the seckill system, it is easy to see that seckill events are essentially planned low-cost sales activities that bring instant traffic growth. After the events, the traffic quickly diminishes. Therefore, seckill events pose the following technical challenges to the backend services:

First, seckill events have a short duration but a high volume of access requests. The seckill system needs to be able to handle this burst-like attack access model.

Second, the number of requests generated by the business far exceeds the sales volume, with the majority of requests ultimately unable to make a successful purchase. The seckill system needs to plan and implement appropriate processing strategies in advance.

Moreover, due to the large number of front-end access requests, there will be a short-term surge in the backend’s data access volume, necessitating sound design for data storage resources.

Additionally, although seckill events have a short duration, they still create a heavy load on the entire business system. The business system needs to develop various strategies to prevent system overload and downtime.

Lastly, due to the low prices of the seckill products, there is room for arbitrage, resulting in various illegal cheating methods. Hence, it is necessary to plan and implement preventive strategies in advance.

Seckill System Design #

When designing a seckill system, there are two design principles.

First, we should try to intercept requests at the upstream of the system, layer by layer, filtering out invalid or excessive requests. Because the amount of requests is far greater than the quantity of goods, it is unnecessary for all requests to reach the last step of the backend service. In fact, this will significantly slow down the requests that can actually be completed, reducing user experience.

Second, we should make full use of caching to improve system performance and availability.

img

The seckill system is specifically designed for seckill activities, where the items for sale are fixed. Therefore, when designing the seckill product page, we can pre-design the product information as static content. The static product information, as well as regular CSS, JS, promotional images, and other static resources, can be stored independently on CDN nodes for accelerated access and reduced system load.

Various restrictions can also be applied on the front end, such as disabling the purchase button when the activity has not started, to prevent early access. After a user has made a purchase, the button can be disabled and the user can be put in a queue, avoiding repeated refreshing.

Before the user’s request enters the seckill system, it is evenly distributed to different web servers through load balancing strategies to avoid overloading nodes. In the web servers, various pre-processing is performed, checking the user’s access permissions and identifying concurrent order brushing behavior. Before the actual service, pre-service checks are also performed to prevent overselling. If the quantity sold has reached the seckill quantity, the system can directly return “sold out”.

When the seckill system handles the seckill business logic, in addition to verifying user permissions, it also needs to access the product service to modify the inventory and access the order service to create orders. Finally, it needs to perform payment, logistics, and other follow-up services. These dependent services can have a queuing strategy specifically designed for the seckill business, or have additional instances deployed to provide dedicated services for the seckill system, avoiding impact on other regular business systems.

img

In the design of a seckill system, it is crucial to perform effective decomposition during system development. First, the content of the seckill activity page should be decomposed, with static content stored on CDNs and dynamic content accessed through interfaces. Second, the seckill business system should be separated from other business systems, with the seckill system and its dependent services deployed separately to avoid impacting other core business systems.

Since the number of participants in a seckill is much larger than the number of items available, there are often behaviors where scripts and zombie accounts are used to frequently call interfaces for brute-force refreshing, in order to improve the chances of successful seckill. The seckill system needs to build an access record cache to record IP addresses and user access behaviors, intercepting and returning responses in case of abnormal access. User caches should also be built, and historical data analysis should be conducted to proactively cache zombie accounts for brute-force refreshing, facilitating strategy restrictions during the seckill period. These access records and user data can be stored in caches to speed up access. In addition, user data should be pre-loaded into the cache to avoid excessive database lookups during the activity.

During the handling of business requests, operations should be completed through cache interactions as much as possible. Since the number of seckill goods is small, all relevant information can be loaded into memory and the cache can be temporarily used as storage without imposing significant cost burdens.

A cache should be built for seckill items to store product information, and all target items should be pre-loaded. At the same time, a separate cache should be built for seckill inventory to accelerate inventory checks. This way, quick queries for product information can be performed through the seckill product list cache, and stock availability during the seckill activity can be quickly determined through the inventory cache, facilitating efficient transactions or fast checks and returns when there are no available items. After a user successfully seckills an item, transactional changes to the inventory, orders, payments, and other related processes can be preliminarily handled by the system only through interactions with cache components. Subsequent operations, such as persistence, can be recorded as trade event information using a message queue mechanism, and then gradually executed in batches to avoid excessive pressure on the database.

In summary, in a seckill system, besides the usual content and service decomposition, it is important to cache all data accesses as much as possible and minimize database lookups. This can greatly improve system performance while enhancing user experience.