01 High Concurrency Distributed System What Are Its Common Design Methods

01 High-Concurrency Distributed System - What Are Its Common Design Methods #

We know that high-concurrency represents heavy traffic. The charm of designing high-concurrency systems lies in our ability to cleverly design solutions to withstand the impact of massive traffic and provide users with a better experience. These solutions seem to manipulate traffic, allowing it to be handled more smoothly by the services and components in the system.

Let’s make a simple analogy.

Throughout history, the Yangtze River and Yellow River have been plagued by floods. In ancient times, Yu the Great widened the river channels and cleared the sediment to make the water flow more smoothly. The Dujiangyan Irrigation System, as one of the most successful water management cases in history, diverted water from the Minjiang River into multiple branches to relieve the pressure of the water flow. The Three Gorges Dam and Gezhouba Dam store water in reservoirs, slowly releasing it to improve downstream flood control capabilities.

When dealing with high-concurrency heavy traffic, we also use similar “flood-resistant” approaches. They can be summarized into three methods.

Scale-out: This is a common approach in high-concurrency system design. By deploying the system in a distributed manner, traffic is divided among multiple servers, with each server handling a portion of the concurrency and traffic.
Cache: Using caching to improve system performance is like “widening the river channel” to withstand the impact of high-concurrency heavy traffic.
Asynchrony: In some scenarios, instead of waiting for processing to complete, we can have requests return first and notify the requesters when the data is ready. This allows more requests to be processed within a given unit of time.

After briefly introducing these three methods, let’s take a closer look so that you can consider these directions when designing high-concurrency systems. Of course, these methods can be further elaborated, and I will discuss them in more detail in future courses.

First, let’s start with the first method: Scale-out.

Scale-up vs Scale-out #

The famous “Moore’s Law” was proposed by Gordon Moore, one of the founders of Intel, in 1965. This law states that the number of transistors that can be accommodated on an integrated circuit doubles approximately every two years.

Later, David House, the CEO of Intel, came up with the concept of “18 months,” predicting that chip performance would double every 18 months. This concept has been widely circulated.

Although Moore’s Law describes the pace of chip development, we can extend it to overall hardware performance. Since the second half of the 20th century, computer hardware performance has evolved exponentially.

Even today, Moore’s Law still holds true. In the process of CPU development over the past half century, chip manufacturers have greatly improved chip performance by employing cutting-edge technology to make smaller transistors within a limited area. From the first generation of integrated circuits with only a dozen transistors to today’s chips with billions of transistors, Moore’s Law has guided chip manufacturers to achieve technological leaps.

However, experts predict that Moore’s Law may no longer be effective in the next few years. The reason is that current chip technology has reached the 10nm level, which is close to its physical limitations. Even if there are new technological breakthroughs, they may be difficult to market due to the high costs involved. Later, the emergence of dual-core and multi-core technologies saved Moore’s Law. The idea behind these technologies is to squeeze multiple CPU cores onto a single chip, greatly enhancing the CPU’s parallel processing capabilities.

In the design of highly concurrent systems, we also adopt a similar approach. We call the strategy of constantly improving CPU performance to chase after Moore’s Law “scale-up” (vertical scaling), and the strategy of using multiple CPU cores on a single chip as in multi-core CPUs “scale-out”. These two approaches are entirely different in terms of implementation.

Scale-up involves improving the system’s concurrent processing capabilities by purchasing more powerful hardware. For example, if the current system with 4 cores and 4GB RAM can handle 200 requests per second, how about handling 400 requests? It’s simple: we upgrade the machine’s hardware to 8 cores and 8GB RAM (note that the increase in hardware resources may not be linear, this is just for reference).
Scale-out, on the other hand, is a different approach. It involves combining multiple low-performance machines into a distributed cluster to withstand high concurrent traffic. Using the previous example, we can use two 4-core 4GB machines to handle those 400 requests.

So when should we choose scale-up, and when should we choose scale-out? Generally speaking, during the initial design phase of our system, we consider using the scale-up approach because it is simple enough. If a problem can be solved by stacking hardware, we use hardware to solve it. However, when the system’s concurrency exceeds the limit of a single machine, we need to use the scale-out approach.

Although scale-out can surpass the limitations of a single machine, it also introduces some complex problems. For example, how to ensure overall availability if a node fails? How to ensure consistency of state information across different nodes when multiple nodes have state that needs to be synchronized? How to seamlessly add and remove nodes without the knowledge of the users? Each of these issues involves many aspects, and I will delve into them in the subsequent courses. I won’t go into detail here.

Having discussed scale-out, let’s now take a look at another approach to designing highly concurrent systems: caching.

Improving Performance with Caching #

There is no doubt that Web 2.0 is the era of caching. Caching is present in every corner of system design, from the operating system to the browser, from the database to the message queue. You can see the shadow of caching in any slightly complex service or component. The main purpose of caching is to improve the accessibility performance of the system, so that it can support simultaneous access by more users in a high concurrency scenario.

So why can caching significantly improve the performance of the system? We know that data is stored in persistent storage, and general persistent storage uses disks as the storage medium. Ordinary disk data is composed of mechanical arms, magnetic heads, shafts, and platters. The platters are divided into tracks, cylinders, and sectors. The disk structure diagram is shown below.

The platter is the storage medium, and each platter is divided into multiple concentric circles, and the information is stored in these concentric circles, which are called tracks. When the disk is working, the platters are spinning at high speed, and the mechanical arm drives the magnetic head to move radially to read the required data on the track. The time spent by the magnetic head to search for information is called the seek time.

The seek time of an ordinary disk is about 10ms, and compared to the time spent on disk seeking, the time for CPU to execute instructions and memory addressing is in the ns (nanosecond) level, and the time to read data from a gigabit network card is in the μs (microsecond) level. Therefore, in the entire computer system, the disk is the slowest component, even slower than other components by several orders of magnitude. Therefore, we usually use cache with memory as the storage medium to improve performance.

Of course, the semantics of caching have become much richer now. We can refer to any intermediate storage that reduces response time as a cache. The idea of caching is prevalent in many design fields, such as multiple-level caches for CPUs in operating systems, and page cache caching for files. You should have some understanding of this.

Asynchronous Processing #

Asynchronous is also a common high-concurrency design method. We often hear this term in many articles and speeches, along with its antonym: synchronous. For example, in the distributed service framework Dubbo, there are synchronous method calls and asynchronous method calls, and in IO models, there are synchronous IO and asynchronous IO.

So what is synchronous and what is asynchronous? Taking method calls as an example, synchronous calls means that the caller needs to block and wait for the called method to finish executing. In this mode, when the called method has a long response time, it will cause the caller to be blocked for a long time, resulting in a decrease in overall system performance or even a system breakdown under high concurrency.

On the contrary, asynchronous calls allow the caller to return and execute other logic without waiting for the method logic to complete. The result is later fed back to the caller through callbacks, event notifications, and other means.

Asynchronous calls are widely used in large-scale high-concurrency systems, such as the well-known 12306 website. When we book tickets, the page displays a message indicating that the system is queuing for our ticket request. This prompt indicates that the system is asynchronously processing our ticket reservation request. In the 12306 system, operations such as querying ticket availability, placing orders, and updating ticket status are time-consuming operations that may involve intercalling multiple internal systems. If synchronous calls were used, it would have been impossible to successfully place an order during peak periods, just like when 12306 was newly launched.

Instead, by using asynchronous processing, the backend will put the request into a message queue, quickly respond to the user, and inform them that the request is being processed in the queue, while simultaneously freeing up resources to handle more requests. After the ticket reservation request is processed, the user will be notified of the success or failure of the reservation.

Delegating the processing logic to the asynchronous processing system reduces the load on the web service, reduces resource consumption, and allows the system to handle more user ticket reservation requests, thus improving its ability to handle high concurrency.

Now that we understand these three methods, does it mean that when designing a high-concurrency system, we should use all of these methods? Of course not, system design is an ongoing process.

Rome wasn’t built in a day, and system design is no different. Systems of different scales have different pain points and different architectural design focuses. If we design all systems according to the scale of millions or tens of millions of concurrent users and make e-commerce systems follow the footsteps of Taobao, and instant messaging systems all learn from WeChat and QQ, then the fate of these systems will undoubtedly be their downfall.

Although systems like Taobao and WeChat can handle the demand of millions or tens of millions of simultaneous online users, the complexity of their internal systems is far beyond our imagination. Blindly following them will only make our architecture unnecessarily complex, making it difficult to maintain. Taking the evolution from a monolithic architecture to a service-oriented architecture as an example, Taobao also started its service-oriented transformation project after years of development when it found issues with the overall scalability of its system.

I have also encountered some pitfalls in a startup project I participated in. In the initial stages, the project adopted a service-oriented architecture. However, due to limited manpower and insufficient technical accumulation, we found it difficult to manage such a complex architecture during the actual project development process. We encountered issues that were difficult to locate, a decrease in overall system performance, and even system crashes that were hard to trace back to the root cause. In the end, we had to integrate the services and return to a simpler monolithic architecture.

Therefore, I suggest that the evolution of a general system should follow the following approach:

Initially, design the simplest system that meets the business requirements and current traffic conditions, using the most familiar technological stack.
As traffic increases and business changes, make adjustments to problematic areas in the architecture, such as single points of failure, horizontal scalability issues, and components that cannot meet performance requirements. During this process, choose community-proven components that the team is familiar with to help solve problems. Only when there are no suitable solutions in the community should you reinvent the wheel.
When small fixes in the architecture are insufficient to meet requirements, consider large-scale refactorings or rewrites to address existing problems.

Taking Taobao as an example, during the phase of going from 0 to 1 in its business, they quickly built the system through acquisitions. Later, as traffic grew, Taobao made a series of technical transformations to enhance its high-concurrency processing capabilities, such as migrating the database storage engine from MyISAM to InnoDB, implementing database sharding, adding caching, and developing middleware.

When these measures were insufficient, they considered large-scale architectural changes. For example, the well-known “Wucaishi” project transformed Taobao’s architecture from a monolith to a service-oriented architecture. It is through gradual technological evolutions like this that Taobao has developed its current technical architecture, which can handle over a hundred million queries per second.

In conclusion, the evolution of a high-concurrency system should be gradual and incremental, driven by the purpose of solving existing problems.

Summary of the Course #

In today’s lesson, I introduced three common methods for designing high-concurrency systems: Scale-out, caching, and asynchronous processing. These three methods can be flexibly applied in solution designs, but they are not specific implementation plans. They are three ways of thinking that can vary in practical applications.

Take Scale-out for example, the actual application plans include master-slave database, sharding, and storage partitioning. It is important to note that when dealing with high-concurrency large traffic, the system can handle the traffic impact by adding more machines. However, the specific plan to adopt depends on the detailed problem analysis.