05 System Design Goals Iii How to Make the System Easy to Expand

05 System Design Goals III - How to Make the System Easy to Expand #

In terms of architectural design, high scalability is an indicator of a well-designed system. It means that the system’s processing capacity can be linearly improved by adding more machines, enabling it to handle higher traffic and concurrency.

You may ask, “Why not determine in advance how many machines are needed to support the current concurrency during the initial architecture design?” I mentioned this question in Lesson 03: System Design Goal (I): How to Improve System Performance?. The answer is that the peak traffic is unpredictable.

Generally, considering cost, during the steady state of business operations, we reserve 30% to 50% redundancy to handle potential peak traffic caused by operational activities or promotions. However, in the event of an unexpected incident, traffic may instantly increase 2 to 3 times or even more. Let’s take Weibo as an example.

When Lu Han and Guan Xiaotong publicly announced their relationship and interacted with each other on Weibo, the traffic to their Weibo pages surged. Users either gathered to view or engage with the content. The traffic on Weibo rapidly increased within a short period of time, and there were instances where the Weibo feed couldn’t display new messages temporarily.

So, how do we deal with sudden traffic surges? It’s not feasible to revamp the entire architecture in a short time, so the fastest solution is to add more machines. However, we need to ensure that after scaling the system by three times, it can support three times the traffic. Some may question, “Isn’t this obvious? It seems simple.” Is it really that simple? Let’s see where the difficulties lie in doing this.

Why is improving scalability complicated #

In the previous lecture, I mentioned that in a single-machine system, the system’s parallel processing capability can be increased by adding processing cores. However, this approach does not always work. When the number of parallel tasks is high, the system reaches a turning point in performance due to resource contention, and the system’s processing capability decreases instead of increasing.

The same is true for cluster systems composed of multiple machines. In a cluster system, there may be some “bottleneck points” at different layers of the system architecture, which restrict the system’s horizontal scalability. This statement may be abstract, but I’ll give you an example to help you understand.

For example, let’s say your system’s traffic is 1000 requests per second, and the number of requests to the database is also 1000 per second. If the traffic increases by 10 times, although the system can scale up to handle the load, the database becomes the bottleneck. Another example is if the network bandwidth of a single machine is 50Mbps, then when you scale up to 30 machines, the front-end load balancer’s bandwidth exceeds the limit of 1Gbps, becoming the bottleneck. So, what are the important factors in our system that restrict scalability?

In fact, stateless services and components are easier to scale, while storage services like MySQL, which are stateful, are more difficult to scale. This is because adding or removing machines from a storage cluster involves a large amount of data migration, which is not supported by most traditional relational databases. This is the main reason why improving system scalability is so complex.

In addition, from the examples, you can see that we need to consider the scalability of the system from the perspective of the overall architecture, not just the business servers. Therefore, factors such as databases, caching, dependencies on third-party services, load balancing, and switch bandwidth are all factors to consider when scaling the system. We need to know which factor becomes the bottleneck after the system reaches a certain level of concurrency, so that we can scale accordingly.

To address these complex scalability issues, I have summarized some system design ideas for your reference.

Design Ideas for High Scalability #

Decoupling is the most important approach to improve system scalability. It breaks down a complex system into independent modules with single responsibilities. Compared to a large system, it is simpler to consider the scalability of individual smaller modules. Simplifying complex problems is our approach.

However, the principles of decoupling differ for different types of modules. Let me give you a simple example. If you were to design a community, how many modules would it have? It might have five modules:

Users: responsible for maintaining community user information, such as registration and login.
Relationships: responsible for maintaining relationships between users, such as following, friends, and blocking.
Content: community posts, similar to a social media feed.
Comments and Likes: typical user interactions.
Search: user and content search.

In terms of deployment, we follow the simplest three-tier architecture: load balancers for request distribution, application servers for business logic, and databases for data storage. In this case, all the modules’ business code is mixed together, and data is stored in one database.

1. Scalability of the Storage Layer #

The data volume and concurrent access to storage differ significantly between different business modules. For example, in a mature community, the data volume of relationships is much larger than that of users, but the access volume of users is much larger than that of relationships. So, if the current bottleneck in storage is capacity, we only need to split the data of the relationships module, not the data of the users module. Therefore, the first dimension to consider in storage splitting is the business dimension.

After splitting, this simple community system would have separate databases for users, content, comments, likes, and relationships. This also isolates failures so that if one database fails, it does not affect other databases.

By splitting based on business, the system’s scalability is improved to a certain extent. However, once the system has been running for a long time, the single business database will still exceed the limits of a single machine in terms of capacity and concurrent requests. At this point, we need to split the database again.

This time, the split is done horizontally based on data features. For example, we can add two more nodes to the user database and split the user data among these three databases using specific algorithms. I will explain database sharding in more detail later.

With horizontal splitting, we can overcome the limitations of a single machine for the database. However, we should note that we cannot add nodes arbitrarily, as adding nodes will require manual data migration, which is costly. So, considering long-term factors, it is best to add enough nodes in one go to avoid frequent capacity expansion.

After splitting the database based on business and data dimensions, we should try to avoid using transactions. When different databases are updated simultaneously within a transaction, two-phase commit is required to ensure that all databases are either fully updated or none are. The cost of this coordination increases as resources expand, eventually becoming unaffordable.

Now that we have covered the scalability of the storage layer, let’s look at how the business layer achieves scalability.

2. Scalability of the Business Layer #

We generally consider three dimensions when considering the splitting approach for the business layer: business dimension, importance dimension, and request source dimension.

First, we need to split services with the same business logic into separate pools. For example, in the above community system, we can split it into user pool, content pool, relationship pool, comment pool, like pool, and search pool.

Each business depends on its own database resources and does not rely on other business databases. So, when one business’s interface becomes a bottleneck, we only need to scale the corresponding business pool and confirm the dependencies of upstream and downstream services. This greatly reduces the complexity of scaling.

In addition to this, we can also classify businesses into core pools and non-core pools based on the importance of their interfaces. For example, in the case of the relationship pool, the follow and unfollow interfaces may be more important and can be placed in the core pool, while the block and unblock operations may be less important and placed in the non-core pool. This way, we can prioritize performance for the core pool and, when overall traffic increases, we can scale the core pool first and downgrade certain non-core pool interfaces, ensuring the overall stability of the system.

Lastly, you can also split business pools based on different types of client access. For example, businesses serving client interfaces can be defined as the external pool, businesses serving mini programs or HTML5 pages can be defined as the H5 pool, and businesses serving other internal departments can be defined as the intranet pool, and so on.

Course Summary #

In this lesson, I have brought you an understanding of the complexity of improving system scalability and the mindset behind system decomposition. Decomposition may seem simple, but there are many details to consider when and how to do it.

Systems that have not been decomposed may lack strong scalability, but they are simple enough that both system development and operational maintenance do not require a significant investment of resources. After decomposition, requirements development will span across multiple systems and small teams, problem diagnosis will also involve multiple systems, and operational maintenance may require dedicated personnel for each subsystem. This challenge is a major hurdle that we must go through and we need to be well prepared for it.