06 How to Implement Database and Table Sharding as Well as Enforce Routing Above

06 How to Implement Database and Table Sharding as well as Enforce Routing Above #

After the introduction in the previous sessions, I believe you have a preliminary understanding of ShardingSphere. Starting today, I will guide you through case studies to gradually master ShardingSphere’s core functionalities. The first one we will cover is the Database Sharding and Table Sharding mechanism.

Single Database and Table System #

Let’s start with the single database and table system. Throughout this course, unless specifically mentioned, we will assume the usage of Spring Boot integration with the ShardingSphere framework, and database access based on Mybatis.

Import Development Frameworks #

The first step in system development is to import the necessary development frameworks. In the code snippet below, we create a Spring Boot project and add references to two starters: sharding-jdbc-spring-boot-starter and mybatis-spring-boot-starter, in the pom.xml file:

<dependency>
   <groupId>org.apache.shardingsphere</groupId>
   <artifactId>sharding-jdbc-spring-boot-starter</artifactId>
</dependency> 
<dependency>
    <groupId>org.mybatis.spring.boot</groupId>
    <artifactId>mybatis-spring-boot-starter</artifactId>
</dependency>

This is all the initialization work required for the development environment. Now let’s introduce the business scenario for the case study.

Clarify the Business Scenario #

Let’s consider a common business scenario in the field of healthcare. In such scenarios, each user has a health record (HealthRecord) that stores the health level (HealthLevel) representing the user’s current health condition, as well as a series of health tasks (HealthTask). Usually, doctors create different health tasks based on the user’s current health record, and users can earn certain health points by completing the tasks specified by the doctors. These points determine the user’s health level, which ultimately affects the entire health record. The more health tasks the user completes, the higher the health level, and the more complete the health record becomes. Consequently, the user needs to do fewer health tasks, forming a positive business cycle. Here, we are not focusing on elaborating the entire business cycle, but rather on the storage and access methods of several core business objects in this scenario.

In this scenario, we focus on four business objects: User, HealthRecord, HealthLevel, and HealthTask. The diagram below presents the basic field definitions for each business object, as well as the associations between these four objects:

Drawing 0.png

Implement Basic Features #

Since we are using Mybatis as the ORM framework, we need to follow the Mybatis development process. Firstly, we need to define the various business entities:

Drawing 2.png Definition of business entities

Based on these business entities, we need to write the corresponding Mapper files. I put these Mapper files in the resources directory of the code project:

Drawing 3.png Definition of Mybatis Mapper files

The next step is to configure the data source information, which is stored in a separate application-traditional.properties configuration file:

spring.datasource.driverClassName = com.mysql.jdbc.Driver
spring.datasource.url = jdbc:mysql://localhost:3306/ds
spring.datasource.username = root
spring.datasource.password = root

According to the Spring Boot’s configuration convention, we set the above configuration file as the active profile in the application.properties configuration file. By using different profiles, we can switch between different configuration systems:

spring.profiles.active=traditional

The next task is to create Repository layer components:

Drawing 4.png Definition of Repository layer interfaces

Finally, we design and implement three related service classes: UserService, HealthLevelService, and HealthRecordService.

Drawing 5.png Definition of Service layer interfaces and implementation classes

Using the UserService, we can insert a batch of user data to initialize user information. Then we have the HealthLevelService, which is dedicated to initializing health level information. Please note that, unlike other business objects, the health level information is a typical dictionary information in the system, and we assume there are 5 health levels in the system.

The third, and most important service, is the HealthRecordService. We use it to store and access HealthRecord and HealthTask data. Here is an example of the implementation of HealthRecordService, which shows how to insert health records and related health tasks:

@Service
public class HealthRecordServiceImpl implements HealthRecordService {
    @Autowired
    private HealthRecordRepository healthRecordRepository;
    @Autowired
    private HealthTaskRepository healthTaskRepository;
    
    @Override
    public void processHealthRecords() throws SQLException{
       insertHealthRecords();
    }

    private List<Integer> insertHealthRecords() throws SQLException {
        List<Integer> result = new ArrayList<>(10);
        for (int i = 1; i <= 10; i++) {
         HealthRecord healthRecord = insertHealthRecord(i);
            insertHealthTask(i, healthRecord);
            result.add(healthRecord.getRecordId());
        }
        return result;
    }

    private HealthRecord insertHealthRecord(final int i) throws SQLException {
     HealthRecord healthRecord = new HealthRecord();
     healthRecord.setUserId(i);
     healthRecord.setLevelId(i % 5);
     healthRecord.setRemark("Remark" + i);
        healthRecordRepository.addEntity(healthRecord);
        return healthRecord;
    }

    private void insertHealthTask(final int i, final HealthRecord healthRecord) throws SQLException {
     HealthTask healthTask = new HealthTask();
     healthTask.setRecordId(healthRecord.getRecordId());
     healthTask.setUserId(i);
     healthTask.setTaskName("TaskName" + i);
     healthTaskRepository.addEntity(healthTask);
    }
}

Now, from scratch, we have implemented the DAO and Service components required for a complete business scenario. These components are very simple in terms of business logic and are developed using the classic Mybatis development process. Finally, we can verify if these components work properly by running a set of simple unit tests. The code snippet below, using the UserServiceTest class as an example, shows the implementation of the test, involving the usage of common unit test annotations like @RunWith and @SpringBootTest:

@RunWith(SpringRunner.class)
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.MOCK)
public class UserServiceTest {
    @Autowired
    private UserService userService;

    @Test
    public void testProcessUsers() throws Exception {
       userService.processUsers();
    }
}

To run this unit test, we can see that the test passed and we can also see the inserted data in the User table in the database. At this point, a single-database single-table system has been built. Next, we will modify this system to support multi-database and multi-table.

To transform the traditional single-database single-table data architecture into a multi-database and multi-table architecture, developers only need to do one thing, which is to configure the specific scenarios based on the ShardingSphere configuration system introduced in the previous lesson. No changes need to be made to the existing business code, which is the power of ShardingSphere. Let’s get started.

System Modification: How to Implement Sharding? #

As the first step of system modification, let’s first see how to achieve data sharding based on the configuration system.

Initialize Data Sources #

For sharding, we design two databases named ds0 and ds1. Obviously, for two data sources, we need to initialize two DataSource objects, and these two DataSource objects will be combined into a Map and passed to the ShardingDataSourceFactory factory class:

spring.shardingsphere.datasource.names=ds0,ds1
spring.shardingsphere.datasource.ds0.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.ds0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds0.url=jdbc:mysql://localhost:3306/ds0
spring.shardingsphere.datasource.ds0.username=root
spring.shardingsphere.datasource.ds0.password=root
spring.shardingsphere.datasource.ds1.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.ds1.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds1.url=jdbc:mysql://localhost:3306/ds1
spring.shardingsphere.datasource.ds1.username=root
spring.shardingsphere.datasource.ds1.password=root

Set Sharding Strategy #

After clarifying the data sources, we need to set the sharding strategy for sharding:

spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=user_id
spring.shardingsphere.sharding.default-database-strategy.inline.algorithm-expression=ds$->{user_id % 2}

We know that in ShardingSphere, there is a set of ShardingStrategyConfiguration. In this case, we use the InlineShardingStrategyConfiguration based on the row expression. InlineShardingStrategyConfiguration has two parameters that need to be set: shardingColumn, which specifies the sharding column name, and algorithmExpression, which specifies the sharding algorithm row expression. In our configuration, the data will be stored in the database based on the modulus value of the user_id column by 2. At the same time, note that the “default-database-strategy” setting is used here. Combining the content from the previous lesson, setting this configuration item is equivalent to specifying the default sharding strategy for the sharding database in the ShardingRuleConfiguration.

Set Binding Tables and Broadcast Tables #

Next, we need to set the binding tables. A binding table is a new concept introduced in ShardingSphere, let me explain.

A binding table refers to a group of main tables and sub-tables that have the same sharding rules. For example, in our business scenario, both the health_record table and the health_task table have a record_id field. If we shard based on this record_id field during application, these two tables can form a binding table relationship.

The reason for introducing the binding table concept is that multi-table join queries with binding table relationships won’t result in Cartesian product, thus improving the efficiency of join queries. For example, if we execute the following SQL:

SELECT record.remark_name FROM health_record record JOIN health_task task ON record.record_id=task.record_id WHERE record.record_id in (1, 2);

If we don’t explicitly configure the binding table relationship, assuming the sharding key record_id routes value 1 to the first shard and value 2 to the second shard, the routed SQL statements would be 4, resulting in a Cartesian product:

SELECT record.remark_name FROM health_record0 record JOIN health_task0 task ON record.record_id=task.record_id WHERE record.record_id in (1, 2);
SELECT record.remark_name FROM health_record0 record JOIN health_task1 task ON record.record_id=task.record_id WHERE record.record_id in (1, 2);
SELECT record.remark_name FROM health_record1 record JOIN health_task0 task ON record.record_id=task.record_id WHERE record.record_id in (1, 2);
SELECT record.remark_name FROM health_record1 record JOIN health_task1 task ON record.record_id=task.record_id WHERE record.record_id in (1, 2);

Then, after configuring the binding table relationship, the routed SQL statements will be reduced to 2:

SELECT record.remark_name FROM health_record0 record JOIN health_task0 task ON record.record_id=task.record_id WHERE record.record_id in (1, 2);
SELECT record.remark_name FROM health_record1 record JOIN health_task1 task ON record.record_id=task.record_id WHERE record.record_id in (1, 2);

Please note that in order to achieve this effect, the sharding keys of the binding tables must be exactly the same. In the above SQL statements, it is easy to see that the sharding key that needs to be exactly the same is record_id.

Let’s go back to the scenario in the case. Obviously, health_record and health_task should have a binding table relationship. So, we can add configuration for this relationship in the configuration file:

spring.shardingsphere.sharding.binding-tables=health_record, health_task

After introducing binding tables, let’s take a look at the concept of broadcast tables. A broadcast table refers to a table that exists in all sharding data sources, which means that the table structure and data in each database are exactly the same. The use case for broadcast tables is quite clear. They are usually used in applications where the data volume is small and need to be associated with massive data tables. A typical example is a dictionary table that should exist in each shard database.

Similarly, in our scenario, the health_level table can be considered as such a dictionary table since it stores limited health level information. So, we also added the definition of the broadcast table in the configuration file. In the following code, you can see it:

spring.shardingsphere.sharding.broadcast-tables=health_level

Setting Table Sharding Rules #

With the previous configuration options, we have completed the configuration for database sharding in the ShardingRuleConfiguration based on the requirements. TableRuleConfiguration is a configuration for table sharding rules, including actualDataNodes for setting the real data nodes, databaseShardingStrategyConfig for setting the database sharding strategy, and keyGeneratorConfig for setting the self-incrementing column generator in a distributed environment. We have already set the default databaseShardingStrategyConfig in the ShardingRuleConfiguration, now we need to complete the settings for actualDataNodes and keyGeneratorConfig.

For the health_record table, since there are two data sources, the actual-data-nodes it belongs to can be represented by row expression ds$->{0..1}.health_record, which represents that the table health_record exists in both ds0 and ds1. As for the keyGeneratorConfig, it is recommended to use the snowflake algorithm. With these information clarified, the configuration for the TableRuleConfiguration of the health_record table is straightforward:

spring.shardingsphere.sharding.tables.health_record.actual-data-nodes=ds$->{0..1}.health_record
spring.shardingsphere.sharding.tables.health_record.key-generator.column=record_id
spring.shardingsphere.sharding.tables.health_record.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.health_record.key-generator.props.worker.id=33

Similarly, the configuration for the health_task table is exactly the same, except that you need to adjust the specific data column for key-generator.column based on the actual situation:

spring.shardingsphere.sharding.tables.health_task.actual-data-nodes=ds$->{0..1}.health_task
spring.shardingsphere.sharding.tables.health_task.key-generator.column=task_id
spring.shardingsphere.sharding.tables.health_task.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.health_task.key-generator.props.worker.id=33

Let’s rerun the HealthRecordTest unit test and check the data in the database. The following image shows the health_record and health_task tables in ds0:

Drawing 6.png Data in the health_record table in ds0

Drawing 7.png Data in the health_task table in ds0

And this image shows the health_record and health_task tables in ds1:

Drawing 8.png Data in the health_record table in ds1

Drawing 9.png Data in the health_task table in ds1

Obviously, the data in these two tables has been correctly sharded.

Summary #

Starting from this lesson, we officially enter the explanation of the core features of ShardingSphere. In order to introduce these feature, we will start with a single database and single table architecture, based on a typical business scenario to organize the data operation requirements, and give the framework of the entire code project, as well as the implementation process of verifying the data operation results based on test cases. Today’s content focuses on how to implement database sharding, and we have achieved database sharding by introducing the powerful configuration system in ShardingSphere.

Here’s a question for you to ponder: How do you understand the meaning and function of binding tables and broadcast tables?

Database sharding is one of the core features of the sharding engine in ShardingSphere, and can also be said to be one of the simplest features. In the next lesson, we will continue to introduce shard mechanisms such as table sharding, database sharding + table sharding, and forced routing.