15 Parsing Engine SQL What Core Stages Should the Parsing Process Include Above

15 Parsing Engine SQL What Core Stages Should the Parsing Process Include Above #

Hello and welcome to Lesson 15. After introducing the infrastructure and implementation mechanisms of the microkernel architecture in ShardingSphere, today we will officially start learning about the sharding engine.

For a database sharding middleware, sharding is its core functionality. The following diagram shows the structure of the entire ShardingSphere sharding engine. We have briefly introduced the components included in the sharding engine in Lesson 12, “From Application to Principle: How to Efficiently Read ShardingSphere Source Code.” We know that for a sharding engine, the first core component is the parsing engine.

Drawing 0.png

For most developers, SQL parsing is an unfamiliar topic, but for a database sharding middleware, it is a fundamental component. Currently, mainstream database sharding middlewares include implementation strategies for the parsing component. It can be said that the result generated by the SQL parsing engine runs throughout the entire ShardingSphere framework. If we cannot understand the SQL parsing process well, we will encounter obstacles when reading the ShardingSphere source code.

On the other hand, the SQL parsing process itself is complex. When you get the ShardingSphere framework source code, you may first ask the question: What are the core stages of the SQL parsing process? Next, I will guide you in-depth on this topic.

From DataSource to the Entry Point of the Parsing Engine #

In the overall introduction to the sharding engine, we can see that to complete sharding operations, we first need to introduce the parsing engine. It is somewhat difficult for students who are new to the ShardingSphere source code to find the entry point of the parsing engine. Here, I will use the code example provided in Lesson 04, “Application Integration: How to Use ShardingSphere in Business Systems,” to analyze the entry point of the parsing engine.

Let’s review the following code snippet, which provides an implementation of data sharding based on the Java language:

// Create a sharding rule configuration
ShardingRuleConfiguration shardingRuleConfig = new ShardingRuleConfiguration();

// Create a table rule configuration
TableRuleConfiguration tableRuleConfig = new TableRuleConfiguration("user", "ds${0..1}.user${0..1}");

// Create a distributed key generator configuration
Properties properties = new Properties();
result.setProperty("worker.id", "33");
KeyGeneratorConfiguration keyGeneratorConfig = new KeyGeneratorConfiguration("SNOWFLAKE", "id", properties);
result.setKeyGeneratorConfig(keyGeneratorConfig);
shardingRuleConfig.getTableRuleConfigs().add(tableRuleConfig);

// Shard the database based on age, divided into 2 databases in total
shardingRuleConfig.setDefaultDatabaseShardingStrategyConfig(new InlineShardingStrategyConfiguration("sex", "ds${sex % 2}"));

//根据用户id分表，一共分为2张表 
shardingRuleConfig.setDefaultTableShardingStrategyConfig(new StandardShardingStrategyConfiguration("id", "user${id % 2}")); 

//通过工厂类创建具体的DataSource 
return ShardingDataSourceFactory.createDataSource(createDataSourceMap(), shardingRuleConfig, new Properties());

As you can see, the above code constructs several data sources, including the database sharding and table sharding strategies, and then obtains the data source DataSource through ShardingDataSourceFactory. Obviously, for application development, DataSource is the entry point for using the ShardingSphere framework. In fact, for the internal operation mechanism of ShardingSphere, DataSource also serves as an entry point into the sharding engine. Centered around the DataSource, by tracing the calling chain of the code, we can obtain the following class hierarchy diagram:

Drawing 2.png

The diagram above introduces many core objects in the ShardingSphere kernel, but today we only focus on the bottom-most object in the entire chain, which is the SQLParseEngine as shown in the diagram. On one hand, during the creation process of DataSource, the SQLParseEngine is ultimately initialized; on the other hand, the ShardingRouter responsible for executing the routing function also depends on the SQLParseEngine. This SQLParseEngine is the entry point in ShardingSphere for the entire SQL parsing process.

From SQL parsing engine to SQL parsing kernel #

In ShardingSphere, there is a group of engine classes that end with “Engine”. From the architectural perspective, these classes generally adopt the facade pattern in design and implementation. The intention of the facade pattern can be described as providing a consistent interface for a set of interfaces in a subsystem. The facade pattern defines a high-level interface that makes this subsystem easier to use. The schematic diagram of this pattern is as shown below:

Drawing 4.png

In terms of function, the facade pattern can serve as a separation between the client and the backend services. With changes in business requirements and the passage of time, the partitioning and implementation of subsystems behind the facade may need to be adjusted and upgraded, and this adjustment and upgrade need to be transparent to the client. In designing middleware frameworks such as ShardingSphere, this kind of isolation is particularly important.

For the SQL parsing engine, the situation is similar. The difference is that the SQLParseEngine itself does not provide a facade effect, but delegates this part of functionality to another core class, SQLParseKernel. From the naming perspective, this class is the kernel class of SQL parsing, and it is also the so-called facade class. SQLParseKernel shields the creation and management process of complex SQL abstract syntax tree object SQLAST, SQL fragment object SQLSegment, and the final SQL statement object SQLStatement in the backend services. The relationships between these classes are as follows:

Drawing 6.png

1.SQLParseEngine #

From the previous class hierarchy diagram, we can see that AbstractRuntimeContext is the construction entry point of SQLParseEngine. As the name implies, RuntimeContext plays the role of a runtime context in ShardingSphere, which holds the sharding rules, sharding properties, database type, execution engine, and SQL parsing engine related to the runtime environment. As the implementation class of the RuntimeContext interface, AbstractRuntimeContext completes the construction of SQLParseEngine in its constructor, as shown in the following construction process:

protected AbstractRuntimeContext(final T rule, final Properties props, final DatabaseType databaseType) { 
       … 
       parseEngine = SQLParseEngineFactory.getSQLParseEngine(DatabaseTypes.getTrunkDatabaseTypeName(databaseType)); 
       … 
}

Clearly, the creation process of SQLParseEngine is accomplished through the factory class SQLParseEngineFactory. The implementation of the factory class SQLParseEngineFactory is as follows:

public final class SQLParseEngineFactory {

    private static final Map<String, SQLParseEngine> ENGINES = new ConcurrentHashMap<>();

    public static SQLParseEngine getSQLParseEngine(final String databaseTypeName) {
        if (ENGINES.containsKey(databaseTypeName)) {
            return ENGINES.get(databaseTypeName);
        }
        synchronized (ENGINES) {
            if (ENGINES.containsKey(databaseTypeName)) {
                return ENGINES.get(databaseTypeName);
            }
            SQLParseEngine result = new SQLParseEngine(databaseTypeName);
            ENGINES.put(databaseTypeName, result);
            return result;
        }
    }
}

From the above code, it can be seen that a layer of memory-based caching is implemented based on the ConcurrentHashMap object. The implementation of SQLParseEngineFactory is representative in ShardingSphere. To improve access performance, ShardingSphere extensively uses this approach to build content-based caching mechanisms.

Next, let’s take a look at the SQLParseEngine class itself. The complete code of this class is as follows:

public final class SQLParseEngine {

    private final String databaseTypeName;

    private final SQLParseResultCache cache = new SQLParseResultCache();

    public SQLStatement parse(final String sql, final boolean useCache) {
        ParsingHook parsingHook = new SPIParsingHook();
        parsingHook.start(sql);
        try {
            SQLStatement result = parse0(sql, useCache);
            parsingHook.finishSuccess(result);
            return result;
        } catch (final Exception ex) {
            parsingHook.finishFailure(ex);
            throw ex;
        }
    }

    private SQLStatement parse0(final String sql, final boolean useCache) {
        if (useCache) {
            Optional<SQLStatement> cachedSQLStatement = cache.getSQLStatement(sql);
            if (cachedSQLStatement.isPresent()) {
                return cachedSQLStatement.get();
            }
        }
        SQLStatement result = new SQLParseKernel(ParseRuleRegistry.getInstance(), databaseTypeName, sql).parse();
        if (useCache) {
            cache.put(sql, result);
        }
        return result;
    }
}

There are several points worth noting about SQLParseEngine:

First, here we use ParsingHook as the Hook management for the system runtime, which is commonly known as code hooks. ShardingSphere provides a series of ParsingHook implementations, and we will further discuss the Hook mechanism when we discuss ShardingSphere’s link tracking later.
Second, we find that the parse method used to parse SQL returns an SQLStatement object. This SQLStatement is the final output object of the entire SQL parsing engine. Here, a SQLParseResultCache is also built based on the Cache class in the Google Guava framework to handle caching for the parsed SQLStatement.

Finally, we find that SQLParseEngine delegates the actual parsing work to SQLParseKernel. Let’s take a look at this SQLParseKernel class next.

2. SQLParseKernel #

In the SQLParseKernel class, we found the following three Engine class definitions, including the SQL parser engine SQLParserEngine (note the difference in class names with SQLParseEngine), the SQLSegment extractor engine SQLSegmentsExtractor, and the SQLStatement filler engine SQLStatementFiller.

// SQL parser engine
private final SQLParserEngine parserEngine;
// SQLSegment extractor engine
private final SQLSegmentsExtractorEngine extractorEngine;
// SQLStatement filler engine
private final SQLStatementFillerEngine fillerEngine;

As the facade class, SQLParseKernel provides the parse method shown below to complete the entire process of SQL parsing. This method uses the three engine classes mentioned above, as shown below:

public SQLStatement parse() {
    // Use ANTLR4 to parse the abstract syntax tree of SQL
    SQLAST ast = parserEngine.parse();
    
    // Extract the tokens from the AST and encapsulate them into corresponding Segments such as TableSegment, IndexSegment, etc.
    Collection<SQLSegment> sqlSegments = extractorEngine.extract(ast);
    Map<ParserRuleContext, Integer> parameterMarkerIndexes = ast.getParameterMarkerIndexes();
    
    // Fill the SQLStatement and return it
    return fillerEngine.fill(sqlSegments, parameterMarkerIndexes.size(), ast.getSqlStatementRule());
}

Three Stages of SQL Parsing Engine: How to Generate SQLAST #

The above code is very consistent with the processing style of the facade class, which is to combine the core classes of the internal system through simple invocation to complete the business flow. Based on these comments, we can answer the question raised at the beginning of this lesson, “What are the core stages of the SQL parsing process?” as follows:

Generate SQL abstract syntax tree (AST) through SQLParserEngine
Extract SQLSegments through SQLSegmentsExtractorEngine
Fill SQLStatement through SQLStatementFiller

These three stages are the core components of ShardingSphere’s new generation SQL parsing engine. The overall architecture is shown in the following figure:

Drawing 8.png

So far, we have completed the overall SQL parsing process composed of the three stages of “parsing, extracting, and filling”. Now we can parse a SQL statement into an SQLStatement object for use by subsequent routing engines such as ShardingRouter.

In this lesson, we first focus on the first stage of the process, which is how to generate an SQLAST (the next two stages will be explained in subsequent lessons). The implementation of this part is located in the parse method of SQLParserEngine, as shown below:

public SQLAST parse() {
    SQLParser sqlParser = SQLParserFactory.newInstance(databaseTypeName, sql);
    
    // Use ANTLR4 to get the parse tree
    ParseTree parseTree;
    try {
        ((Parser) sqlParser).setErrorHandler(new BailErrorStrategy());
        ((Parser) sqlParser).getInterpreter().setPredictionMode(PredictionMode.SLL);
        parseTree = sqlParser.execute().getChild(0);
    } catch (final ParseCancellationException ex) {
        ((Parser) sqlParser).reset();
        ((Parser) sqlParser).setErrorHandler(new DefaultErrorStrategy());
        ((Parser) sqlParser).getInterpreter().setPredictionMode(PredictionMode.LL);

parseTree = sqlParser.execute().getChild(0);
} 
if (parseTree instanceof ErrorNode) { 
    throw new SQLParsingException(String.format("Unsupported SQL of `%s`", sql)); 
}

// Get the Statement rule from the configuration file
SQLStatementRule rule = parseRuleRegistry.getSQLStatementRule(databaseTypeName, parseTree.getClass().getSimpleName()); 
if (null == rule) { 
    throw new SQLParsingException(String.format("Unsupported SQL of `%s`", sql)); 
}

// Wrap the Abstract Syntax Tree (AST)
return new SQLAST((ParserRuleContext) parseTree, getParameterMarkerIndexes((ParserRuleContext) parseTree), rule);
}

In the above code, the SQLParser interface is responsible for parsing the SQL into an Abstract Syntax Tree (AST). The creation of the specific SQLParser implementation class is handled by SQLParserFactory. The SQLParserFactory defines the following:

public final class SQLParserFactory {
  
  public static SQLParser newInstance(final String databaseTypeName, final String sql) {
    // Load all extensions using SPI mechanism
    for (SQLParserEntry each : NewInstanceServiceLoader.newServiceInstances(SQLParserEntry.class)) {
      
      // Check database type
      if (each.getDatabaseTypeName().equals(databaseTypeName)) {
          return createSQLParser(sql, each);
      }
    }
    throw new UnsupportedOperationException(String.format("Cannot support database type '%s'", databaseTypeName));
  }
  ...
}

Here, another core interface is introduced, namely SQLParserEntry. As can be seen in the code structure, SQLParserEntry is located in the org.apache.shardingsphere.sql.parser.spi package of the shardingsphere-sql-parser-spi project, which further confirms that SQLParserEntry is an SPI interface.

The relationship between SQLParser and SQLParserEntry is that SQLParser is the entry point exposed by the parser, while SQLParserEntry is the underlying implementation of the parser. This follows the API-SPI relationship commonly found in ShardingSphere as a middleware framework.

These two interfaces are related to the AST generation mechanism based on ANTLR4. ANTLR is an abbreviation for Another Tool for Language Recognition, which is an open-source parser generator that can generate syntax trees based on user-written ANTLR grammar rules. In ShardingSphere, ANTLR4 is used to generate the AST. The SQLParserEngine’s parse method ultimately returns an SQLAST object that contains the ParserRuleContext from ANTLR4 and the SQLStatementRule.

In the SQLAST object, the ParserRuleContext is obtained from ANTLR4, and the SQLStatementRule is a rule object that defines the SQLSegment extractor. This leads us to the next phase, which is extracting SQLSegments, which will be discussed in the next lesson.