18 Routing Engine How to Implement Fragment Routing and Broadcast Routing for Data Access

18 Routing Engine How to Implement Fragment Routing and Broadcast Routing for Data Access #

In the previous lesson, we saw that the ShardingRouter, which plays a key role, calls the RoutingEngine to get the routing result. In ShardingSphere, there are multiple types of RoutingEngine for different application scenarios.

We can classify these routing methods into two categories: sharding routing and broadcast routing based on whether they carry sharding key information. And within these two categories, there are several common types of RoutingEngine implementations, as shown in the diagram below:

image

We don’t intend to provide a detailed explanation of all of these RoutingEngines, but in the following content, we will discuss representative RoutingEngines in sharding routing and broadcast routing separately.

Sharding Routing #

For sharding routing, we will focus on Standard Routing, which is the recommended sharding strategy in ShardingSphere.

In the usage scenario, we need to first consider the scope of application for standard routing. Standard routing is suitable for two major scenarios: one is for SQLs that do not contain join queries, and the other is for SQLs that only contain binding table join queries. The first scenario is easier to understand, while for the latter, we need to introduce the concept of binding tables in ShardingSphere.

We have discussed binding tables in “06 | Data Sharding: How to implement database sharding, table sharding, database + table sharding, and forced routing (Part 1)?”. With a clear understanding of these concepts, let’s take a look at the specific implementation process of standard routing.

1. Creation process of StandardRoutingEngine #

After clarifying the basic meaning of standard routing, let’s review the RoutingEngineFactory class introduced in the previous lesson. The RoutingEngineFactory class creates the corresponding RoutingEngine based on the routing information in the context. However, in its newInstance method, we did not find the code directly creating StandardRoutingEngine. In fact, the creation of StandardRoutingEngine is in the last code branch in the newInstance method, which means it enters the last code branch of getShardingRoutingEngine when none of the previous checks are satisfied, as shown below:

private static RoutingEngine getShardingRoutingEngine(final ShardingRule shardingRule, final SQLStatementContext sqlStatementContext,
                                                      final ShardingConditions shardingConditions, final Collection<String> tableNames) { 
    // Get sharding tables based on sharding rules 
    Collection<String> shardingTableNames = shardingRule.getShardingLogicTableNames(tableNames); 
    // If there is only one target table or all the tables are binding tables, create a StandardRoutingEngine 
    if (1 == shardingTableNames.size() || shardingRule.isAllBindingTables(shardingTableNames)) { 
        return new StandardRoutingEngine(shardingRule, shardingTableNames.iterator().next(), sqlStatementContext, shardingConditions); 
    } 
    // Otherwise, create a ComplexRoutingEngine 
    return new ComplexRoutingEngine(shardingRule, tableNames, sqlStatementContext, shardingConditions); 
}

This code first obtains the sharding tables based on the parsed logical tables, using the following SQL statement as an example:

SELECT record.remark_name FROM health_record record JOIN health_task task ON record.record_id=task.record_id WHERE record.record_id = 1

Then, the shardingTableNames should be health_record and health_task. If the sharding operation only involves one table or involves multiple tables that are binding tables to each other, the StandardRoutingEngine will be used for routing.

Based on the concept of binding tables, when multiple tables are binding tables to each other, the routing result for each table is the same, so we only need to calculate the sharding for the first table. On the contrary, if this condition is not satisfied, a ComplexRoutingEngine will be created for routing.

Let’s take a look at how the isAllBindingTables method in the code determines whether multiple tables are binding tables, which is located in the ShardingRule class, as shown below:

public boolean isAllBindingTables(final Collection<String> logicTableNames) { 
    if (logicTableNames.isEmpty()) { 
        return false; 
    } 
    // Create a special BindingTableRule based on the logicTableNames passed in 
    Optional<BindingTableRule> bindingTableRule = findBindingTableRule(logicTableNames); 
    if (!bindingTableRule.isPresent()) { 
        return false; 
    } 
    Collection<String> result = new TreeSet<>(String.CASE_INSENSITIVE_ORDER); 
    // Get logic tables from BindingTableRule 
    result.addAll(bindingTableRule.get().getAllLogicTables()); 
    // Check if the retrieved logic tables are consistent with the passed in logicTableNames 
    return !result.isEmpty() && result.containsAll(logicTableNames); 
}

This code creates a special BindingTableRule based on the logicTableNames passed in, and checks if the LogicTable obtained from the BindingTableRule is the same as the passed-in logicTableNames. The process of creating a BindingTableRule is actually to obtain the corresponding BindingTableRule from the Collection <BindingTableRule> saved in ShardingRule based on the logicTableName passed in, as shown below:

public Optional<BindingTableRule> findBindingTableRule(final String logicTableName) { 
        for (BindingTableRule each : bindingTableRules) { 
            if (each.hasLogicTable(logicTableName)) { 
                return Optional.of(each); 
            } 
        } 
        return Optional.absent(); 
}

The bindingTableRules in the above code are the Collection BindingTableRule saved in ShardingRule itself. We found the code to initialize bindingTableRules in the constructor of ShardingRule, as shown below:

bindingTableRules = createBindingTableRules(shardingRuleConfig.getBindingTableGroups());

Obviously, this construction process is related to the rule configuration mechanism. If we use a YAML configuration file, the binding table configuration will generally be in the following format: shardingRule: bindingTables: health_record,health_task

For this configuration, the ShardingRule will parse it and generate a BindingTableRule object as shown below:

private BindingTableRule createBindingTableRule(final String bindingTableGroup) { 
    List<TableRule> tableRules = new LinkedList<>(); 
    for (String each : Splitter.on(",").trimResults().splitToList(bindingTableGroup)) { 
        tableRules.add(getTableRule(each)); 
    } 
    return new BindingTableRule(tableRules); 
}

So far, we have finally introduced the concept and implementation of binding tables, which means we have completed the introduction of the code branch that enters the StandardRoutingEngine in RoutingEngineFactory.

2. Operation Mechanism of StandardRoutingEngine #

Now that we have created the StandardRoutingEngine, let’s take a look at its operation mechanism. As a specific routing engine implementation, StandardRoutingEngine implements the RoutingEngine interface, and its route method is as follows:

@Override 
public RoutingResult route() { 
    ... 
    return generateRoutingResult(getDataNodes(shardingRule.getTableRule(logicTableName))); 
}

The core method here is generateRoutingResult. Before that, we need to use the getDataNodes method to obtain the data node information. The method is as follows:

private Collection<DataNode> getDataNodes(final TableRule tableRule) { 
    // Routing based on Hint 
    if (isRoutingByHint(tableRule)) { 
        return routeByHint(tableRule); 
    } 
    // Routing based on sharding conditions 
    if (isRoutingByShardingConditions(tableRule)) { 
        return routeByShardingConditions(tableRule); 
    } 
    // Execute mixed routing 
    return routeByMixedConditions(tableRule); 
}

We can see that the method takes a TableRule object as its parameter, and TableRule is part of the sharding rule ShardingRule. As we learned in the previous lesson, this object mainly contains various rule information related to sharding, including ShardingStrategy. From the names, ShardingStrategy is a type of sharding strategy used to specify the specific column for sharding, and execute sharding to return the target DataSource and Table.

We will expand on this topic in the next lesson. Here, let’s first organize the class structure related to ShardingStrategy, as shown below:

image

In the StandardRoutingEngine, the overall structure is similar to the one shown above. In the StandardRoutingEngine, the getDataNodes method introduced earlier has three different code branches, but they all essentially obtain a collection of RouteValue. In the previous lesson, we learned that RouteValue stores the table name and column name used for routing. After obtaining the required RouteValue, in the StandardRoutingEngine, all three scenarios mentioned above will ultimately call the route0 basic method for routing. The purpose of this method is to derive the collection of target DataNodes based on these RouteValues. Similarly, we also know that the DataNode stores the specific target node, including the dataSourceName and tableName. The route0 method is as follows:

private Collection<DataNode> route0(final TableRule tableRule, final List<RouteValue> databaseShardingValues, final List<RouteValue> tableShardingValues) { 
    //Route DataSource 
    Collection<String> routedDataSources = routeDataSources(tableRule, databaseShardingValues); 
    Collection<DataNode> result = new LinkedList<>(); 
    //Route Table and assemble the DataNode collection 
    for (String each : routedDataSources) { 
        result.addAll(routeTables(tableRule, each, tableShardingValues)); 
    } 
    return result; 
}

As we can see, this method first routes the DataSource, and then routes the Table based on each DataSource, finally completing the assembly of the DataNode collection. In the routeDataSources and routeTables methods mentioned above, each will ultimately rely on the DatabaseShardingStrategy and TableShardingStrategy respectively to complete the routing calculation behind the scenes to obtain the target DataSource and Table.

After obtaining the DataNode collection, we return to the generateRoutingResult method of the StandardRoutingEngine. This method is used to assemble the routing result and return a RoutingResult:

private RoutingResult generateRoutingResult(final Collection<DataNode> routedDataNodes) { 
    RoutingResult result = new RoutingResult(); 
    for (DataNode each : routedDataNodes) { 
        // Build a RoutingUnit object for each DataNode 
        RoutingUnit routingUnit = new RoutingUnit(each.getDataSourceName()); 
        ...
    } 
    ...
}

As we can see, this method creates a RoutingResult and iterates through the collection of DataNodes to build a RoutingUnit object for each DataNode. // Fill the TableUnit in RoutingUnit routingUnit.getTableUnits().add(new TableUnit(logicTableName, each.getTableName())); result.getRoutingUnits().add(routingUnit); } return result; }

The purpose of this code is to construct a RoutingUnit object for each DataNode and then fill the TableUnit in RoutingUnit. We have already introduced the data structure of RoutingUnit and TableUnit in the previous lesson, so we won’t go into detail here.

With this, the introduction to the standard routing engine, StandardRoutingEngine, comes to an end. Standard routing is the recommended way of sharding in ShardingSphere and is also the most widely used in daily development.

Broadcast Routing #

For SQL statements that do not carry sharding keys, the routing engine will use broadcast routing. In ShardingSphere, there are many routing engines for broadcasting based on the type of the input SQL. Let’s take a look at the method for creating a RoutingEngine in the RoutingEngineFactory.

First, if the input is a TCLStatement, which includes database control language statements such as authorization and role control, the DatabaseBroadcastRoutingEngine will be executed directly. Similarly, if it is a DDLStatement used for data definition, the routing method in TableBroadcastRoutingEngine will be executed. The conditions for judging are as follows:

// Database broadcast routing
if (sqlStatement instanceof TCLStatement) {
    return new DatabaseBroadcastRoutingEngine(shardingRule);
}
// Table broadcast routing
if (sqlStatement instanceof DDLStatement) {
    return new TableBroadcastRoutingEngine(shardingRule, metaData.getTables(), sqlStatementContext);
}

The routing method in DatabaseBroadcastRoutingEngine is straightforward. It simply constructs a RoutingUnit based on each DataSourceName, and then assembles them into a RoutingResult, as shown below:

public final class DatabaseBroadcastRoutingEngine implements RoutingEngine {

    private final ShardingRule shardingRule;

    @Override
    public RoutingResult route() {
        RoutingResult result = new RoutingResult();
        for (String each : shardingRule.getShardingDataSourceNames().getDataSourceNames()) {
            // Construct a RoutingUnit based on each DataSourceName
            result.getRoutingUnits().add(new RoutingUnit(each));
        }
        return result;
    }
}

Similarly, you can imagine the implementation process of TableBroadcastRoutingEngine. We obtain the corresponding TableRule based on the logicTableName, and then construct a RoutingUnit object based on the actual DataNode in TableRule. This process is shown below:

private Collection<RoutingUnit> getAllRoutingUnits(final String logicTableName) {
    Collection<RoutingUnit> result = new LinkedList<>();
    // Get the corresponding TableRule based on logicTableName
    TableRule tableRule = shardingRule.getTableRule(logicTableName);
    for (DataNode each : tableRule.getActualDataNodes()) {
        // Construct a RoutingUnit object based on the actual DataNode in TableRule
        RoutingUnit routingUnit = new RoutingUnit(each.getDataSourceName());
        // Construct a TableUnit based on the TableName of DataNode
        routingUnit.getTableUnits().add(new TableUnit(logicTableName, each.getTableName()));
        result.add(routingUnit);
    }
    return result;
}

Next, let’s look at the scenario for DALStatement, which is relatively more complex. Depending on the type of the input DALStatement, there are several different branches for processing, as shown below:

private static RoutingEngine getDALRoutingEngine(final ShardingRule shardingRule, final SQLStatement sqlStatement, final Collection<String> tableNames) {
    // If it is a Use statement, do nothing
    if (sqlStatement instanceof UseStatement) {
        return new IgnoreRoutingEngine();
    }
    // If it is a Set or ResetParameter statement, do database broadcast
    if (sqlStatement instanceof SetStatement || sqlStatement instanceof ResetParameterStatement || sqlStatement instanceof ShowDatabasesStatement) {
        return new DatabaseBroadcastRoutingEngine(shardingRule);
    }
    // If there is a default database, do default database routing
    if (!tableNames.isEmpty() && !shardingRule.tableRuleExists(tableNames) && shardingRule.hasDefaultDataSourceName()) {
        return new DefaultDatabaseRoutingEngine(shardingRule, tableNames);
    }
    // If the table list is not empty, do unicast routing
    if (!tableNames.isEmpty()) {
        return new UnicastRoutingEngine(shardingRule, tableNames);
    }
    //
    return new DataSourceGroupBroadcastRoutingEngine(shardingRule);
}

Let’s take a look at several routing engines here. First, let’s start with the simplest IgnoreRoutingEngine, which only returns an empty RoutingResult object and does nothing else, as shown below:

public final class IgnoreRoutingEngine implements RoutingEngine { 

    @Override 
    public RoutingResult route() { 
        return new RoutingResult(); 
    } 
}

Essentially, UnicastRoutingEngine represents unicast routing, which is used in scenarios where information from a specific real table needs to be retrieved. It only needs to fetch data from any table in any data source. For example, the DESCRIBE statement is suitable for using UnicastRoutingEngine because the data structure of the description in each real table is the same.

The implementation process of UnicastRoutingEngine is shown below. Since the method is quite long, we have trimmed the code and used comments to indicate the execution logic of each branch directly:

@Override 
public RoutingResult route() { 
    RoutingResult result = new RoutingResult(); 
    if (shardingRule.isAllBroadcastTables(logicTables)) { 
        // If all tables are broadcast tables, assemble TableUnit for each logicTable, and then build RoutingUnit 
    } else if (logicTables.isEmpty()) { 
        // If the tables are null, directly assemble RoutingUnit without building TableUnit 
    } else if (1 == logicTables.size()) { 
        // If there is only one table, assemble RoutingUnit and TableUnit for the single table 
    } else { 
        // If there are multiple entity tables, first get DataSource, and then assemble RoutingUnit and TableUnit 
    } 
    return result; 
}

DefaultDatabaseRoutingEngine, as the name suggests, routes the default database. So how is the default database determined? We can find the answer from the getDefaultDataSourceName method in the ShardingDataSourceNames class of ShardingRule.

Usually, this default database can be set through configuration. Once we understand this, it is not difficult to understand the routing process of DefaultDatabaseRoutingEngine. Its route method is shown below:

@Override 
public RoutingResult route() { 
    RoutingResult result = new RoutingResult(); 
    List<TableUnit> routingTables = new ArrayList<>(logicTables.size()); 
    for (String each : logicTables) { 
        routingTables.add(new TableUnit(each, each)); 
    } 
    // Get the database name configured as the default from ShardingRule 
    RoutingUnit routingUnit = new RoutingUnit(shardingRule.getShardingDataSourceNames().getDefaultDataSourceName()); 
    routingUnit.getTableUnits().addAll(routingTables); 
    result.getRoutingUnits().add(routingUnit); 
    return result; 
}

Finally, let’s take a look at the handling process of the data control language DCLStatement. In a master-slave environment, for DCLStatement, sometimes we only want the SQL statement to be executed on the master database. This is why we have the MasterInstanceBroadcastRoutingEngine as shown below:

@Override 
public RoutingResult route() { 
    RoutingResult result = new RoutingResult(); 
    for (String each : shardingRule.getShardingDataSourceNames().getDataSourceNames()) { 
        if (dataSourceMetas.getAllInstanceDataSourceNames().contains(each)) { 
            // Get master-slave database information through MasterSlaveRule 
            Optional<MasterSlaveRule> masterSlaveRule = shardingRule.findMasterSlaveRule(each); 
            if (!masterSlaveRule.isPresent() || masterSlaveRule.get().getMasterDataSourceName().equals(each)) { 
                result.getRoutingUnits().add(new RoutingUnit(each)); 
            } 
        } 
    } 
    return result; 
}

As we can see, a MasterSlaveRule rule is introduced here, which provides the getMasterDataSourceName method to obtain the master DataSourceName. This allows us to execute the statement specifically on this master database, such as Grant and other data control languages.

From Source Code Analysis to Daily Development #

In ShardingSphere, it is necessary to emphasize its design and practice in managing configuration information. Based on the ShardingRule and TableRule configuration classes, ShardingSphere isolates a large amount of complex configuration information from the business logic flow, and these configuration information often needs to be flexibly set with multiple default values. Based on the two-level configuration system of ShardingRule and TableRule, the system can better integrate business logic changes and configuration information changes, which is worth trying and applying in our daily development process.

Conclusion and Preview #

Today we focused on the implementation process of various routing engines in ShardingSphere. ShardingSphere implements multiple routing engines, which can be divided into sharding routing and broadcast routing. We have discussed the representative implementation schemes of these two types of routing engines.

Here’s a thinking question for you: How does ShardingSphere determine if two tables are bound tables? Feel free to discuss with others in the comments, and I will comment on the answers one by one.

From today’s content, we have also seen that the implementation of the routing mechanism in the routing engine depends on the integration of sharding strategies and their underlying sharding algorithms. The next lesson will focus on various sharding strategies in ShardingSphere in detail.