17 Routing Engine How to Understand the Operation Mechanism of the Sharding Routing Core Class Sharding Router

17 Routing Engine How to Understand the Operation Mechanism of the Sharding Routing Core Class ShardingRouter #

In the previous lessons, we introduced the SQL parsing engine in ShardingSphere, and we understand that the role of SQL parsing is to generate an SQLStatement object based on the input SQL statement.

Starting today, we will enter the source code analysis of the routing engine in ShardingSphere. In terms of the workflow, the routing engine is the second step in the entire sharding engine execution process. It takes the SQLStatement generated by the SQL parsing engine and uses the context information carried in the parsing and execution process to obtain the sharding strategy that matches the database and tables, and generate routing results.

Layering: Overall architecture of the routing engine #

Just like when we introduced the SQL parsing engine, we first examine the package structure by referring to the ShardingSphere source code. The diagram below summarizes the core classes related to the routing mechanism:

Drawing 0.png

The above diagram summarizes the various core classes related to the routing mechanism. We can see that there is a symmetrical structure, meaning that it is divided into two branches based on whether it is a PreparedStatement or a normal Statement.

At the same time, we can also divide the classes in this diagram into two levels based on their package structure: the sharding-core-route at the lower level and the sharding-core-entry at the upper level. This is a common packaging principle used in ShardingSphere, namely organizing the package structure based on the hierarchical level of the class. We have also introduced the packaging principle of ShardingSphere in [“12 | From Application to Principle: How to Read ShardingSphere Source Code Efficiently?”]. Next, we will analyze the application of this principle in the routing engine.

1. sharding-core-route project #

Let’s start with the ShardingRouter class shown in the diagram, which is the entry point of the entire routing process. The ShardingRouter class directly depends on the SQLParseEngine class in the parsing engine to complete SQL parsing and obtain the SQLStatement object, which is then used by the PreparedStatementRoutingEngine and StatementRoutingEngine. Note that these classes are all in the sharding-core-route project, which is a lower-level component.

2. sharding-core-entry project #

On the other hand, the PreparedQueryShardingEngine and SimpleQueryShardingEngine in the diagram are located in the sharding-core-entry project. From the naming of the package, “entry” is equivalent to the entry point of access. Therefore, we can deduce that the classes provided in this project belong to the components oriented towards the application layer, and are at a higher level. The users of PreparedQueryShardingEngine and SimpleQueryShardingEngine are ShardingPreparedStatement and ShardingStatement, respectively. Going further up, we have classes like ShardingConnection and ShardingDataSource, which directly face the application layer.

Core class: ShardingRouter #

Through the above analysis, we have a preliminary understanding of the overall structure of the routing engine. For the execution process that adopts a layered structure, there are two parsing approaches: top-down or bottom-up. Today, our approach is to start from the bottom and analyze the links of the process layer by layer. Let’s start with the ShardingRouter class, the lowest-level object in the routing engine. The variable definitions are as follows:

private final ShardingRule shardingRule;
private final ShardingSphereMetaData metaData;
private final SQLParseEngine parseEngine;

In the ShardingRouter class, we first see the familiar SQL parsing engine SQLParseEngine and its usage method:

public SQLStatement parse(final String logicSQL, final boolean useCache) {
    return parseEngine.parse(logicSQL, useCache);
}

The above code is very simple, which means that the SQLParseEngine is used to parse the incoming SQL and return an SQLStatement object. Here, the SQL is named logicSQL to distinguish it from the actual SQL in sharding and read-write separation situations.

Next, let’s take a look at the ShardingRule. Note that this is a base class that represents various sharding rules information. The ShardingRule class is in the sharding-core-common project, mainly stores various sharding rule information related to sharding, as well as the process of creating distributed primary keys like ShardingKeyGenerator. The variable definitions and their corresponding comments are as follows:

// Sharding rule configuration class, encapsulating various configuration items
private final ShardingRuleConfiguration ruleConfiguration;
// Data source name list
private final ShardingDataSourceNames shardingDataSourceNames;
// Rule list for tables
private final Collection<TableRule> tableRules;
// Rule list for binding tables
private final Collection<BindingTableRule> bindingTableRules;
// List of broadcast table names
private final Collection<String> broadcastTables;
// Default database sharding strategy
private final ShardingStrategy defaultDatabaseShardingStrategy;
// Default table sharding strategy
private final ShardingStrategy defaultTableShardingStrategy;
// Default sharding key generator
private final ShardingKeyGenerator defaultShardingKeyGenerator; 
// Rule list for read-write separation
private final Collection<MasterSlaveRule> masterSlaveRules; 
// Encryption rule
private final EncryptRule encryptRule;

The content of ShardingRule is very extensive, but its main purpose is to provide rule information, rather than being part of the core process. Therefore, we will not go into detail about it for now. As the basic rule class, ShardingRule will be used throughout the sharding process. We will introduce it in more detail in the next part, but for now, it is sufficient to have a basic understanding of the variable names and their meanings.

Let’s go back to the ShardingRouter class and notice that it only has one core method, which is the route method. The logic of this method is quite complex. Let’s organize its execution steps as shown in the following diagram:

image

ShardingRouter is the core class of the routing engine. In the upcoming content, we will explain each of the 6 steps in the above diagram in detail, helping you understand the design philosophy and implementation mechanism of a routing engine.

1. Sharding Reasonableness Verification #

Firstly, let’s look at the first step of ShardingRouter, which is to verify the reasonableness of sharding information. The verification is performed as follows:

    // Validate the Statement using ShardingStatementValidator
    Optional<ShardingStatementValidator> shardingStatementValidator = ShardingStatementValidatorFactory.newInstance(sqlStatement); 
    if (shardingStatementValidator.isPresent()) { 
         shardingStatementValidator.get().validate(shardingRule, sqlStatement, parameters); 
    }

This code uses ShardingStatementValidator to validate the input SQLStatement. Here, we can see the use of the typical Factory pattern. The Factory class ShardingStatementValidatorFactory is as follows:

    public final class ShardingStatementValidatorFactory { 

        public static Optional<ShardingStatementValidator> newInstance(final SQLStatement sqlStatement) { 
            if (sqlStatement instanceof InsertStatement) { 
                return Optional.<ShardingStatementValidator>of(new ShardingInsertStatementValidator()); 
            } 
            if (sqlStatement instanceof UpdateStatement) { 
                return Optional.<ShardingStatementValidator>of(new ShardingUpdateStatementValidator()); 
            } 
            return Optional.absent(); 
        } 
    }

Note that ShardingStatementValidator only needs to validate the InsertStatement and UpdateStatement. So how is the validation performed? Let’s take a look at the definition of ShardingStatementValidator as shown below:

    public interface ShardingStatementValidator<T extends SQLStatement> { 

        // Validate if the sharding operation is supported
        void validate(ShardingRule shardingRule, T sqlStatement, List<Object> parameters); 
    }

For the validation process, the core idea is to determine if there needs to be special handling between the Segment in SQLStatement and the rules in ShardingRule. Let’s take ShardingInsertStatementValidator as an example to see the validation process. Its validate method is as follows:

    public final class ShardingInsertStatementValidator implements ShardingStatementValidator<InsertStatement> { 

        @Override 
        public void validate(final ShardingRule shardingRule, final InsertStatement sqlStatement, final List<Object> parameters) { 
            Optional<OnDuplicateKeyColumnsSegment> onDuplicateKeyColumnsSegment = sqlStatement.findSQLSegment(OnDuplicateKeyColumnsSegment.class);
// If it is an "ON DUPLICATE KEY UPDATE" statement and the current operation is on a sharding Column, the validation fails
if (onDuplicateKeyColumnsSegment.isPresent() && isUpdateShardingKey(shardingRule, onDuplicateKeyColumnsSegment.get(), sqlStatement.getTable().getTableName())) { 
    throw new ShardingException("INSERT INTO .... ON DUPLICATE KEY UPDATE can not support update for sharding column."); 
}

// Get the SQLStatementContext
SQLStatementContext sqlStatementContext = SQLStatementContextFactory.newInstance(metaData.getRelationMetas(), logicSQL, parameters, sqlStatement);

// Create a GeneratedKey if the SQLStatement is an InsertStatement
Optional<GeneratedKey> generatedKey = sqlStatement instanceof InsertStatement
        ? GeneratedKey.getGenerateKey(shardingRule, metaData.getTables(), parameters, (InsertStatement) sqlStatement) : Optional.<GeneratedKey>absent();

// Create ShardingConditions
ShardingConditions shardingConditions = getShardingConditions(parameters, sqlStatementContext, generatedKey.orNull(), metaData.getRelationMetas());
boolean needMergeShardingValues = isNeedMergeShardingValues(sqlStatementContext);
if (sqlStatementContext.getSqlStatement() instanceof DMLStatement && needMergeShardingValues) {
    checkSubqueryShardingValues(sqlStatementContext, shardingConditions);
    mergeShardingConditions(shardingConditions);
}

// Execute routing
RoutingEngine routingEngine = RoutingEngineFactory.newInstance(shardingRule, metaData, sqlStatementContext, shardingConditions);
RoutingResult routingResult = routingEngine.route();

From the code snippet, we can see that the logic here is related to the specific “ON DUPLICATE KEY UPDATE” syntax of MySQL. This syntax allows us to insert data rows with duplicate primary keys via the UPDATE method (in fact, this syntax is not conventional and should not be used frequently).

First, the ShardingInsertStatementValidator checks if there is an OnDuplicateKeyColumn and whether this column is a sharding key. If both conditions are met, an exception will be thrown, prohibiting the use of the “INSERT INTO …. ON DUPLICATE KEY UPDATE” syntax on the sharding column.

Next, the SQLStatementContext is obtained to provide the runtime context for subsequent processing. The SQLStatementContextFactory is used to create the SQLStatementContext object based on the specific SQLStatement input. There are three types of SQLStatementContext: SelectSQLStatementContext, InsertSQLStatementContext, and CommonSQLStatementContext. They all implement the SQLStatementContext interface and store context information related to the specific SQLStatement, providing a means for data storage and transfer for subsequent processing.

After that, a generated key is created if the SQLStatement is an InsertStatement. Generating a distributed key is not so straightforward in a data sharding scenario, so there are many design ideas and implementation techniques behind this code. For more information on this topic, you can refer to the article [《14 | What are the distributed key implementation methods in ShardingSphere?》], where we shared in-depth insights into the distributed key generation mechanism in ShardingSphere.

Next, the code creates sharding conditions. The getShardingConditions method is called to create the ShardingConditions based on the SQL type. Depending on whether it is an InsertSQLStatement or not, the InsertClauseShardingConditionEngine or WhereClauseShardingConditionEngine is used to create the ShardingConditions. The purpose of the sharding conditions is to extract the relationship between the target databases, tables, and columns used for routing. After obtaining these ShardingConditions, there is an optimization step called mergeShardingConditions, where the ShardingConditions that can be merged are combined.

Finally, the routing is executed. The RoutingEngine is obtained and the route method is called to perform the routing. These two lines of code are the core of the ShardingRouter class. We obtain an instance of RoutingEngine and execute the routing based on that instance, returning a RoutingResult object. The RoutingEngine interface is defined as follows, with only one simple route method:

public interface RoutingEngine {
    // execute the routing
    RoutingResult route();
}

In ShardingSphere, there is a set of implementation classes for RoutingEngine. The RoutingEngineFactory class is responsible for generating these specific RoutingEngines, with the following generation logic:

public static RoutingEngine newInstance(final ShardingRule shardingRule,
                                         final ShardingSphereMetaData metaData, 
                                         final SQLStatementContext sqlStatementContext, 
                                         final ShardingConditions shardingConditions) { 
    SQLStatement sqlStatement = sqlStatementContext.getSqlStatement(); 
    Collection<String> tableNames = sqlStatementContext.getTablesContext().getTableNames(); 

    // database broadcast routing
    if (sqlStatement instanceof TCLStatement) { 
        return new DatabaseBroadcastRoutingEngine(shardingRule); 
    } 
    // table broadcast routing
    if (sqlStatement instanceof DDLStatement) { 
        return new TableBroadcastRoutingEngine(shardingRule, metaData.getTables(), sqlStatementContext); 
    } 
    // block routing
    if (sqlStatement instanceof DALStatement) { 
        return getDALRoutingEngine(shardingRule, sqlStatement, tableNames); 
    } 
    // instance broadcast routing
    if (sqlStatement instanceof DCLStatement) { 
        return getDCLRoutingEngine(shardingRule, sqlStatementContext, metaData); 
    } 
    // default database routing
    if (shardingRule.isAllInDefaultDataSource(tableNames)) { 
        return new DefaultDatabaseRoutingEngine(shardingRule, tableNames); 
    } 
    // database broadcast routing
    if (shardingRule.isAllBroadcastTables(tableNames)) { 
        return sqlStatement instanceof SelectStatement ? new UnicastRoutingEngine(shardingRule, tableNames) : new DatabaseBroadcastRoutingEngine(shardingRule); 
    } 
    // default database routing
    if (sqlStatementContext.getSqlStatement() instanceof DMLStatement && tableNames.isEmpty() && shardingRule.hasDefaultDataSourceName()) { 
        return new DefaultDatabaseRoutingEngine(shardingRule, tableNames); 
    } 
    // unicast routing
    if (sqlStatementContext.getSqlStatement() instanceof DMLStatement && shardingConditions.isAlwaysFalse() || tableNames.isEmpty() || !shardingRule.tableRuleExists(tableNames)) { 
        return new UnicastRoutingEngine(shardingRule, tableNames); 
    } 
    // sharding routing
    return getShardingRoutingEngine(shardingRule, sqlStatementContext, shardingConditions, tableNames); 
}

We will provide a detailed introduction to these specific RoutingEngine implementations in the next lesson, “Lesson 18 | Routing Engine: How does ShardingSphere implement data access sharding routing and broadcast routing?”. Here, it is sufficient to understand that in terms of package structure design, ShardingSphere categorizes the specific RoutingEngines into six categories: broadcast routing, complex routing, default database routing, ignore routing, standard routing, and unicast routing, as shown below:

Drawing 3.png

Different types of RoutingEngine implementations

The execution result of RoutingEngine is RoutingResult, which contains a collection of RoutingUnits. The variables defined in RoutingUnit are as follows, which show that there are two variables related to DataSource names and a list of TableUnits:

// actual DataSource name
private final String dataSourceName;
// logical DataSource name
private final String masterSlaveLogicDataSourceName;
// list of table units
private final List<TableUnit> tableUnits = new LinkedList<>();

TableUnit holds the logical table name and actual table name, as shown below:

public final class TableUnit { 
    // logical table name
    private final String logicTableName; 
    // actual table name
    private final String actualTableName; 
}

Therefore, the RoutingResult actually stores a set of mappings between databases and tables, with logical and actual values for both.

6. Constructing the routing result #

After routing is processed through a series of routing engines, we obtain a RoutingResult object. However, instead of directly returning it, we construct an SQLRouteResult object. This is the last step in the route method of ShardingRouter, as shown below:

// construct SQLRouteResult
SQLRouteResult result = new SQLRouteResult(sqlStatementContext, shardingConditions, generatedKey.orNull());
result.setRoutingResult(routingResult);
// if it is an Insert statement, set the generated sharding key
if (sqlStatementContext instanceof InsertSQLStatementContext) {
    setGeneratedValues(result);
}
return result;

Let’s take a look at the definition of SQLRouteResult to see how it differs from RouteResult. The variables in SQLRouteResult are as follows:

// SQLStatement context
private final SQLStatementContext sqlStatementContext;
// sharding conditions
private final ShardingConditions shardingConditions;
// generated sharding key
private final GeneratedKey generatedKey;
// a set of routing units
private final Collection<RouteUnit> routeUnits = new LinkedHashSet<>();
// RoutingResult generated by RoutingEngine
private RoutingResult routingResult;

As we can see, SQLRouteResult contains a RoutingResult. We can consider SQLRouteResult as the routing result returned by the entire SQL routing process, which will be used by higher-level objects such as PreparedStatementRoutingEngine in subsequent steps. RoutingResult, on the other hand, is only the routing result returned by the RoutingEngine, and its consumer is ShardingRouter, which is located at the lower level.

At the same time, we notice a new Unit object called RouteUnit, which contains the data source name and SQL unit object SQLUnit, as shown below:

public final class RouteUnit { 
    // data source name
    private final String dataSourceName; 
    // SQL unit
    private final SQLUnit sqlUnit;
}
    private SQLRouteResult executeRoute(final String sql, final List<Object> clonedParameters) {
        routingHook.start(sql);
        try {
            // 解析 SQL
            SQLStatement sqlStatement = shardingRouter.parse(sql, !clonedParameters.isEmpty());
            // 路由 SQL
            return shardingRouter.route(sql, clonedParameters, sqlStatement);
        } finally {
            routingHook.finishSuccess();
        }
    }

在 executeRoute 方法中,首先调用 shardingRouter.parse(sql, !clonedParameters.isEmpty()) 解析 SQL 语句,判断是否带有参数。然后调用 shardingRouter.route(sql, clonedParameters, sqlStatement) 进行 SQL 路由。

至此,我们完成了对 ShardingRouter 类的讲解。在 ShardingSphere 的路由引擎中,ShardingRouter 可以说是一个承上启下的核心类,向下我们可以挖掘各种 RoutingEngine 的具体实现;向上我们可以延展到读写分离等面向应用的具体场景。 // Call the template method to execute routing and obtain the result SQLRouteResult result = route(sql, clonedParameters); routingHook.finishSuccess(result, metaData.getTables()); return result; } catch (final Exception ex) { routingHook.finishFailure(ex); throw ex; } }

This method has a similar code structure to the parse method in SQLParseEngine, and also uses the Hook mechanism.

From the perspective of design patterns, BaseShardingEngine uses a very typical template method. When we need to complete a process or a series of steps, and these processes or steps remain consistent at a certain detail level, but individual steps may have different implementations at a more detailed level, we can consider using the template method pattern to handle it. The implementation process of the template method is also very simple, in fact, it uses the inheritance mechanism of classes. As a template class, we can see that BaseShardingEngine provides two template methods for subclasses to implement, respectively:

// Clone parameters
protected abstract List<Object> cloneParameters(List<Object> parameters);
// Execute routing
protected abstract SQLRouteResult route(String sql, List<Object> parameters);

Obviously, for SimpleQueryShardingEngine, it does not require any parameters, so cloneParameters directly returns an empty list. The route method directly uses the StatementRoutingEngine introduced earlier for routing. The complete implementation of the SimpleQueryShardingEngine class is as follows:

public final class SimpleQueryShardingEngine extends BaseShardingEngine {

    private final StatementRoutingEngine routingEngine;

    public SimpleQueryShardingEngine(final ShardingRule shardingRule, final ShardingProperties shardingProperties, final ShardingSphereMetaData metaData, final SQLParseEngine sqlParseEngine) {
        super(shardingRule, shardingProperties, metaData);
        routingEngine = new StatementRoutingEngine(shardingRule, metaData, sqlParseEngine);
    }

    @Override
    protected List<Object> cloneParameters(final List<Object> parameters) {
        return Collections.emptyList();
    }

    @Override
    protected SQLRouteResult route(final String sql, final List<Object> parameters) {
        return routingEngine.route(sql);
    }
}

So far, most of the content about the ShardingSphere routing engine has been covered. For the upper-level structure, we have expanded the SimpleQueryShardingEngine as an example, and the handling of the PreparedQueryShardingEngine is similar. As a summary, we can use the following sequence diagram to understand the main process of these routers.

Drawing 8.png

From Source Code Analysis to Daily Development #

The principle of package division can be used to design and plan the code structure of open-source frameworks. In today’s content, we saw a very typical layering and packaging implementation strategy in ShardingSphere. With the sharding-core-route and sharding-core-entry projects, we used reasonable layer management for the core class ShardingRouter in the routing engine at the lower level and the PreparedQueryShardingEngine and the SimpleQueryShardingEngine classes at the upper level. ShardingSphere has many specific manifestations of the application of layering and packaging strategies, and as the course evolves, we will see more application scenarios.

Summary and Preview #

As the second core component of the ShardingSphere sharding engine, the routing engine aims to generate the SQLRouteResult target object. The most core part of the entire routing engine is the ShardingRouter class. In today’s lesson, we discussed the overall execution process of ShardingRouter in detail and also introduced the underlying object RoutingEngine in the routing engine.

Here’s a thought question for you: What are the steps involved in a complete routing execution process in ShardingSphere? Feel free to discuss with everyone in the comments, and I will provide feedback on the answers.

In today’s lesson, we also mentioned that there are multiple RoutingEngine in ShardingSphere. In the next lesson, we will focus on the specific implementation process of these RoutingEngine.