26 Distributed Design Belt Design of the Consistent Hash Ring

26 Distributed Design - Belt Design of the Consistent Hash Ring #

In the previous lectures, we have learned about the basic design of distributed database architecture and completed the design of data sharding, table structure, and indexes. After completing these lectures, I believe you have a basic understanding of distributed databases and can design a basic infrastructure for a distributed database.

However, this is far from enough. When we talk about distributed architecture, not only does the database need to undergo a transformation to achieve distributed architecture, but the business layer also needs to undergo a transformation to achieve distributed architecture and ultimately complete the striped design. So, what is striped design, and how do you complete the striped design for the entire link? This is the topic we will discuss today.

What is Striped Design #

Striped design is a storage technique that divides the disk into stripes. It can divide continuous data into blocks of the same size. Simply put, striped design is a method of writing each segment of data to different disks in an array.

As you can see, the essence of striped design is to scatter data across multiple disks to improve overall storage performance. Isn’t this concept very similar to the sharding concept of distributed databases? The following diagram shows the striped storage of RAID0:

From the diagram, you can see that after performing RAID striping, the data is stored on three disks: disk 1, disk 2, and disk 3. The stored data is also scattered and stored on stripe 1, stripe 2, and stripe 3.

In this way, when accessing a piece of data, it can be retrieved in parallel from the 3 disks, and writing can also be simultaneously performed on the 3 disks, thus improving storage performance.

After understanding the basics of striped design, what is the access scenario for the “striped” design of distributed database architecture?

Striped Design for the Entire Link #

In Lecture 22, we have already discussed that the essence of distributed databases is to shard data based on one or several columns (referred to as “sharding keys”) and then scatter them using a pre-set algorithm (sharding algorithm) to form individual shards.

More importantly, for tables in distributed databases, a unified sharding key should be selected, so that most tables can scatter data based on this sharding key. This way, when subsequent business operations access data, they can complete unitized closed-loop operations within a shard without involving cross-shard access.

The following diagram shows the sharding effect of the tpch distributed architecture after transformation:

From the diagram, we can see that this is similar to the idea of striped design that we advocated earlier, where data is scattered to improve performance. For distributed databases, the more shards there are, the higher the performance limit.

However, this is only a striped design for the database layer and does not consider a striped design for the entire link from the perspective of the whole chain. Let’s look at an example: let’s say the order service is an important service in e-commerce, and we have implemented distributed striped design for the orders table:

As you can see, the order service can access data from different shards based on the o_custkey field, which is a design that most businesses would implement (we don’t consider high availability here because service layers are usually stateless). However, this design does not conform to the idea of a striped design for the entire link.

The idea of a striped design for the entire link means that the upper-level services should also be treated as part of the stripes. In other words, the order service should also undergo distributed architectural transformation based on sharding. So, if we are to implement a fully distributed striped design, the above order service should be designed as follows:

As you can see, when implementing distributed striped design, the upper-level business services also need to be transformed accordingly. The “big” order service layer should be split into multiple “small” order services, with each order service accessing its own sharded data.

The advantages of this design are:

Better security, as each service can verify whether the accessing user has access to its own sharded data.
Improved business performance, as the upper-level services are deployed in a striped manner following the data shards.
Enhanced availability, as the upper-level services are deployed in a striped manner following the data shards.

The first point is usually easier to understand, but the second and third points may not be. Why would performance be improved? Let’s consider the deployment scenario of the business, which is commonly known as multi-active architecture.

Multi-active Architecture #

In the previous chapter on high availability, we mentioned that for high availability architecture, it is necessary to deploy across multiple data centers using methods such as lossless semi-synchronous replication and the latest MySQL Group Replication technology. The database instances are deployed through three regions. This way, when one data center goes down, switching to another data center can be done quickly. Let us review the architecture design of the three regions:

The diagram shows a three-region high availability architecture design achieved through lossless semi-synchronous replication, enabling city-to-city cross-data center switching. However, this is only a single-instance MySQL database architecture. What if we consider a distributed architecture? Are all shards located in a single data center?

If all shards are in a single data center, you will find that the databases in data center 2 and data center 3 are read replicas and can only perform read operations, but not write operations. This is what we call a single-active architecture.

Unlike the single-active architecture, a multi-active architecture refers to systems in different geographical locations that can provide both read and write services. The term “active” refers to the ability to provide real-time read and write services, not just read services. The main purpose of a multi-active architecture is to improve system resilience, enhance system availability, and ensure continuous availability of the business.

To implement a multi-active architecture, we first need to transform the distributed database, then place the primary servers of different data shards in different data centers, and finally achieve business striping deployment. As shown in the following diagram:

As you can see, for the order service and order data shards discussed in the previous section, they are deployed in different data centers. Order service 1 is deployed in data center 1 and has read and write access to shard 1. Order service 2 is deployed in data center 1 and has read and write access to shard 2. Order service 3 is deployed in data center 3 and has read and write access to shard 3.

This way, each data center can handle write traffic, and each data center is “active”. This is what we call a multi-active architecture.

If one data center goes offline, such as data center 1, the system will switch to another data center. The upper-level services and databases switch together. After the switch, the upper-level services and databases are still located in the same data center, eliminating the need for cross-data center access, and providing the best performance and availability guarantee.