03 Jdbc Standard Compatibility and the Relationship With Sharding Sphere

03 JDBC Standard Compatibility and the Relationship with ShardingSphere #

We know that ShardingSphere is a typical client-side sharding solution, and one of the implementation methods of client-side sharding is to rewrite the JDBC specification. In the previous lesson, we also mentioned that ShardingSphere is completely compatible with the JDBC specification from the beginning, and the set of sharding operation interfaces exposed by ShardingSphere is completely consistent with the interfaces provided in the JDBC specification.

Now you might be wondering, how does ShardingSphere achieve API compatibility with the JDBC specification to provide sharding functionality?

This question is very important, and it is worth spending a lesson to analyze and explain. Understanding the JDBC specification and how ShardingSphere rewrites the JDBC specification is a prerequisite for using ShardingSphere to implement data sharding. Today, we will delve into the relationship between the JDBC specification and ShardingSphere to help you uncover the magic from the bottom-up design.

Introduction to the JDBC specification #

ShardingSphere provides a complete implementation process that is fully compatible with the JDBC specification. Before we delve into this process in detail, let’s review the JDBC specification. The original intention of JDBC (Java Database Connectivity) is to provide a unified standard for various databases. Different database vendors comply with this standard and provide their own implementation solutions for application programs to call. As a unified standard, the JDBC specification has a complete architecture, as shown in the following figure:

Drawing 0.png

In the JDBC architecture, the Driver Manager is responsible for loading various driver programs (Drivers) and returning the corresponding database connection (Connection) to the caller based on different requests. The application program accesses the database by calling the JDBC API. For developers, the JDBC API is the main way for us to access the database, and it is also the entry point for ShardingSphere to rewrite the JDBC specification and add sharding functionality. If we develop a data access processing flow using JDBC, the code would typically look like the following:

// Create a pooled data source
PooledDataSource dataSource = new PooledDataSource ();
// Set MySQL Driver
dataSource.setDriver ("com.mysql.jdbc.Driver");
// Set the database URL, username and password
dataSource.setUrl ("jdbc:mysql://localhost:3306/test");
dataSource.setUsername ("root");
dataSource.setPassword ("root");
// Get a connection
Connection connection = dataSource.getConnection();
// Execute a query
PreparedStatement statement = connection.prepareStatement ("select * from user");
// Process the query result
ResultSet resultSet = statement.executeQuery();
while (resultSet.next()) {
  ...
}
// Close resources
statement.close();
resultSet.close();
connection.close();

This code contains the core interfaces of the JDBC API. Using these core interfaces is the basic way we use JDBC for data access. It is important to expand on the roles and usage methods of these interfaces. In fact, as the course content evolves, you will find that these interfaces are also used in ShardingSphere for daily development.

DataSource #

In the JDBC specification, DataSource represents a data source, and its core function is to obtain a database connection object, Connection. In the JDBC specification, Connection can be directly obtained through the DriverManager. We know that the process of acquiring a Connection involves establishing a connection with the database, which incurs significant system overhead.

To improve performance, an intermediate layer is usually established. This intermediate layer stores the Connection generated by the DriverManager in a connection pool, and then retrieves the Connection from the pool. DataSource can be considered as such an intermediate layer. In the daily development process, we usually use DataSource to obtain Connection. Similarly, in ShardingSphere, the DataSource object is also an enhanced object provided to business developers. The definition of the DataSource interface is as follows:

public interface DataSource  extends CommonDataSource, Wrapper {
  Connection getConnection() throws SQLException; 
  Connection getConnection(String username, String password) throws SQLException;
}

It can be seen that the DataSource interface provides two overloaded methods for obtaining Connection, and it also inherits the CommonDataSource interface, which is the root interface for defining data sources in JDBC. In addition to the DataSource interface, there are two sub-interfaces:

Drawing 1.png

Among them, DataSource is the officially defined basic interface for obtaining Connection, ConnectionPoolDataSource is the interface for obtaining Connection from the connection pool, and XADataSource is used to obtain Connection in a distributed transaction environment. We will encounter this interface when discussing ShardingSphere’s distributed transaction.

Please note that the DataSource interface also inherits a Wrapper interface. From the naming of the interface, it can be inferred that this interface should serve as a wrapper. In fact, since many database vendors provide extension features beyond the standard JDBC API, the Wrapper interface can wrap a non-JDBC standard interface provided by a third-party vendor into a standard interface. Taking the DataSource interface as an example, if we want to implement our own data source MyDataSource, we can provide a MyDataSourceWrapper class that implements the Wrapper interface to wrap and adapt it as follows:

Drawing 2.png

In the JDBC specification, in addition to DataSource, core objects like Connection, Statement, ResultSet, and others also inherit this interface. Obviously, ShardingSphere provides non-JDBC standard interfaces, so it should also use this Wrapper interface and provide similar implementation schemes.

Connection #

The purpose of the DataSource is to obtain a Connection object, which can be understood as a session mechanism. Connection represents a database connection and is responsible for communication with the database. All SQL execution is performed within a specific Connection environment, and it also provides a set of overloaded methods for creating Statement and PreparedStatement. On the other hand, Connection also involves transaction-related operations. In order to implement sharding operations, ShardingSphere has also implemented a customized Connection class called ShardingConnection.

Statement #

There are two types of Statement in the JDBC specification: regular Statement and PreparedStatement that supports precompilation. Precompilation means that the database compiler will compile the SQL statement in advance and cache the precompiled result in the database. This way, when it is executed again with parameter replacement, the precompiled statement can be directly used, thus improving the execution efficiency of SQL. Of course, this precompilation also comes with a cost, so in daily development, it is more appropriate to use the Statement object to process one-time read-write operations on the database; and when multiple executions of SQL statements are involved, PreparedStatement can be used.

If you need to query the data in the database, you only need to call the executeQuery method of the Statement or PreparedStatement object, which takes the SQL statement as a parameter and returns a JDBC ResultSet object after execution. Of course, Statement or PreparedStatement also provides a large set of overloaded methods for executing SQL update and query. In ShardingSphere, it also provides ShardingStatement and ShardingPreparedStatement, which are Statement objects that support sharding operations.

ResultSet #

Once an SQL statement is executed and a ResultSet object is obtained through Statement or PreparedStatement, the entire result set can be traversed by calling the next() method of the ResultSet object. If the next() method returns true, it means that there is data in the result set, and the corresponding result value can be obtained by calling the getXXX() methods of the ResultSet object. For sharding operations, because it involves obtaining target data from multiple databases or tables, the obtained results need to be merged. Therefore, ShardingSphere also provides the ShardingResultSet object in the sharding environment.

To summarize, we have outlined the development process of database access based on the JDBC specification, as shown in the following diagram:

Drawing 3.png

ShardingSphere provides an API that is fully compatible with the JDBC specification. In other words, developers can use this development process and the core interfaces in JDBC to complete sharding engine, data desensitization, and other operations. Let’s take a look.

JDBC Rewrite Implementation based on Adapter Pattern #

In ShardingSphere, the basic strategy for implementing compatibility with the JDBC specification is to use the Adapter Pattern. The Adapter Pattern in design patterns is usually used as a bridge between two incompatible interfaces, involving the addition of independent or incompatible functionality to a certain interface.

As a set of implementation solutions that adapt to the JDBC specification, ShardingSphere needs to rewrite the DataSource, Connection, Statement, and ResultSet, which are core objects in the JDBC API introduced above. Although these objects carry different functions, the rewriting mechanism should be common. Otherwise, customized development would be required for different objects, which obviously does not conform to our design principles. Therefore, ShardingSphere abstracts and develops a set of implementation solutions based on the Adapter Pattern. The overall structure is as follows, as shown in the following diagram:

Drawing 4.png

Firstly, we can see that there is a JdbcObject interface, which generally refers to the DataSource, Connection, Statement, and other core interfaces in the JDBC API. As mentioned earlier, these interfaces inherit from the Wrapper interface. ShardingSphere provides an implementation class WrapperAdapter for this Wrapper interface, which is shown in the diagram. The org.apache.shardingsphere.shardingjdbc.jdbc.adapter package in the ShardingSphere code project sharding-jdbc-core contains all the implementation classes related to the Adapter:

Drawing 5.png

At the bottom of the diagram showing the implementation solution based on the Adapter pattern in ShardingSphere, there is a definition of the ShardingJdbcObject class. This class is also a generic term and represents the ShardingDataSource, ShardingConnection, ShardingStatement, and other objects used for sharding in ShardingSphere.

Finally, it is found that ShardingJdbcObject inherits from an AbstractJdbcObjectAdapter, and the AbstractJdbcObjectAdapter in turn inherits from AbstractUnsupportedOperationJdbcObject. These two classes are abstract classes and also generic terms for a group of classes. The difference between the two is that the AbstractJdbcObjectAdapter provides implementation methods for only part of the methods in the JdbcObject interface, which are required for sharding operations. The implementation of the methods that we do not need is handled by AbstractUnsupportedOperationJdbcObject. The collection of all methods in these two classes is the definition of all methods in the original JdbcObject interface.

In this way, we have a rough understanding of the rewriting mechanism of the core interfaces in the JDBC specification by ShardingSphere. This rewriting mechanism is very important and widely used in ShardingSphere. We can further understand this mechanism through examples.

ShardingSphere Rewriting JDBC Specification Example: ShardingConnection #

In the previous introduction, we learned that ShardingSphere’s sharding engine provides a series of ShardingJdbcObjects to support sharding operations, including ShardingDataSource, ShardingConnection, ShardingStatement, ShardingPreparedStament, etc. In this example, we will explain the implementation process of the most representative ShardingConnection. Please note that today we are focusing on the rewriting mechanism and will not go into too much detail about the specific functions of ShardingConnection and its interaction with other classes.

Class Hierarchy of ShardingConnection #

ShardingConnection is an adapter and wrapper for the Connection interface in JDBC, so it needs to provide the methods defined in the Connection interface, including createConnection, getMetaData, various overloads of prepareStatement and createStatement, as well as methods for transactions such as setAutoCommit, commit, and rollback. ShardingConnection overrides these methods, as shown in the following diagram:

Drawing 6.png Method list diagram of ShardingConnection

One sub-branch of the class hierarchy of ShardingConnection is a specific application of the adapter pattern. This part of the class hierarchy is exactly the same as the class hierarchy of the rewriting mechanism introduced earlier, as shown in the following diagram:

111.jpeg

AbstractConnectionAdapter #

First, let’s take a look at the AbstractConnectionAdapter abstract class, which ShardingConnection directly inherits from. In AbstractConnectionAdapter, we find a cachedConnections member, which is a Map object that actually caches the real Connection objects behind this encapsulated ShardingConnection. If we repeatedly use an AbstractConnectionAdapter, these cachedConnections will also be cached until the close method is called. You can understand the specific operation process from the getConnections method of AbstractConnectionAdapter:

public final List<Connection> getConnections(final ConnectionMode connectionMode, final String dataSourceName, final int connectionSize) throws SQLException {
    // Get the DataSource
    DataSource dataSource = getDataSourceMap().get(dataSourceName);
    Preconditions.checkState(null != dataSource, "Missing the data source name: '%s'", dataSourceName);
    Collection<Connection> connections;

    // Get connections from cachedConnections based on the data source
    synchronized (cachedConnections) {
        connections = cachedConnections.get(dataSourceName);
    }

    // If there are more connections than the required connectionSize, only get the required part
    List<Connection> result;
    if (connections.size() >= connectionSize) {
        result = new ArrayList<>(connections).subList(0, connectionSize);
} else if (!connections.isEmpty()) {// If connections are not enough
    result = new ArrayList<>(connectionSize);
    result.addAll(connections);
    // Create new connections
    List<Connection> newConnections = createConnections(dataSourceName, connectionMode, dataSource, connectionSize - connections.size());
    result.addAll(newConnections);
    synchronized (cachedConnections) {
        // Put the newly created connections into the cache for management
        cachedConnections.putAll(dataSourceName, newConnections);
    }
} else {// If there are no Connections for the corresponding dataSource in the cache, create and put them in the cache
    result = new ArrayList<>(createConnections(dataSourceName, connectionMode, dataSource, connectionSize));
    synchronized (cachedConnections) {
        cachedConnections.putAll(dataSourceName, result);
    }
}
return result;
}

There are three conditions in this code, the process is relatively simple, you can refer to the comments, what needs to be paid attention to is the createConnections method:

private List<Connection> createConnections(final String dataSourceName, final ConnectionMode connectionMode, final DataSource dataSource, final int connectionSize) throws SQLException {
    if (1 == connectionSize) {
        Connection connection = createConnection(dataSourceName, dataSource);
        replayMethodsInvocation(connection);
        return Collections.singletonList(connection);
    }
    if (ConnectionMode.CONNECTION_STRICTLY == connectionMode) {
        return createConnections(dataSourceName, dataSource, connectionSize);
    }
    synchronized (dataSource) {
        return createConnections(dataSourceName, dataSource, connectionSize);
    }
}

This code involves the ConnectionMode, which is an important concept in the ShardingSphere execution engine. We will explain it in detail in Lesson 21 “Execution Engine: How to abstract the overall flow of SQL execution in a sharded environment?”. Here, we can see that the createConnections method calls a createConnection abstract method, which needs to be implemented by the subclasses of AbstractConnectionAdapter:

protected abstract Connection createConnection(String dataSourceName, DataSource dataSource) throws SQLException;

At the same time, we see that for the created Connection objects, they all need to execute this statement:

replayMethodsInvocation(connection);

This line of code is difficult to understand. Let’s go to the place where it is defined, the WrapperAdapter class.

WrapperAdapter #

From the name, WrapperAdapter is an adapter class for a wrapper, which implements the Wrapper interface in JDBC. In this class, we find a pair of method definitions:

// Record method invocation
public final void recordMethodInvocation(final Class<?> targetClass, final String methodName, final Class<?>[] argumentTypes, final Object[] arguments) {
    jdbcMethodInvocations.add(new JdbcMethodInvocation(targetClass.getMethod(methodName, argumentTypes), arguments));
}

// Replay method invocation
public final void replayMethodsInvocation(final Object target) {
    for (JdbcMethodInvocation each : jdbcMethodInvocations) {
        each.invoke(target);
    }
}

Both of these methods use the JdbcMethodInvocation class:

public class JdbcMethodInvocation {
    @Getter
    private final Method method;
    @Getter
    private final Object[] arguments;

    public void invoke(final Object target) {
        method.invoke(target, arguments);
    }
}

Obviously, the JdbcMethodInvocation class uses reflection to perform the corresponding method execution based on the passed method and arguments.

After understanding the principle of the JdbcMethodInvocation class, it is not difficult to understand the purpose of the recordMethodInvocation and replayMethodsInvocation methods. Among them, recordMethodInvocation is used to record the methods and parameters to be executed, and replayMethodsInvocation performs the execution based on these methods and parameters using reflection.

To execute replayMethodsInvocation, we must first find the entry point of recordMethodInvocation. From the invocation relationship in the code, we can see that it is called in AbstractConnectionAdapter, specifically in the setAutoCommit, setReadOnly, and setTransactionIsolation methods. Let’s take the setReadOnly method as an example to show its implementation:

@Override
public final void setReadOnly(final boolean readOnly) throws SQLException {
    this.readOnly = readOnly; 
    // Call the `recordMethodInvocation` method to record metadata of the method invocation
    recordMethodInvocation(Connection.class, "setReadOnly", new Class[]{boolean.class}, new Object[]{readOnly});
    // Perform the callback
    forceExecuteTemplate.execute(cachedConnections.values(), new ForceExecuteCallback<Connection>() {
        @Override
        public void execute(final Connection connection) throws SQLException {
            connection.setReadOnly(readOnly);
        }
    });
}

AbstractUnsupportedOperationConnection #

On the other hand, from the class hierarchy, it can be seen that AbstractConnectionAdapter directly inherits from AbstractUnsupportedOperationConnection instead of WrapperAdapter. In AbstractUnsupportedOperationConnection, a set of methods that directly throw exceptions is provided. Here are some parts of the code:

public abstract class AbstractUnsupportedOperationConnection extends WrapperAdapter implements Connection {
    @Override
    public final CallableStatement prepareCall(final String sql) throws SQLException {
        throw new SQLFeatureNotSupportedException("prepareCall");
    }

    @Override
    public final CallableStatement prepareCall(final String sql, final int resultSetType, final int resultSetConcurrency) throws SQLException {
        throw new SQLFeatureNotSupportedException("prepareCall");
}

}

The purpose of this implementation method, AbstractUnsupportedOperationConnection, is to clearly indicate which operations are not supported by AbstractConnectionAdapter and its subclass ShardingConnection. It is a specific implementation method of separating responsibilities.

Summary #

Understanding and applying the compatibility between the JDBC specification and ShardingSphere is essential. After mastering this concept, the next lesson will introduce the topic of application integration, specifically how to use ShardingSphere in a business system.

Here’s a question for you to think about: How does ShardingSphere rewrite the core classes in the JDBC using the adapter pattern?

The JDBC specification and compatibility with ShardingSphere are important concepts. In this lesson, we first introduced the core interfaces in the JDBC specification. Based on these interfaces, ShardingSphere rewrote the JDBC specification using the adapter pattern. We detailed the implementation process of the ShardingConnection class, which rewrote the Connection interface in ShardingSphere.

思考题答案:ShardingSphere 通过抽象和适配器模式,将正确的存储介质集成进了JDBC规范中,实现了对关系型数据库的隐藏。同时,ShardingSphere 在一部分未实现的方法上直接抛出异常,并提供了 WrapperAdapter 类用于执行具体的方法。这样一来,用户直接面对的是ShardingConnection 接口,而具体的实现则由适配器和其他相关类来完成。

Answer to the question: ShardingSphere incorporates the correct storage medium into the JDBC specification through abstraction and adapter pattern, and hides the implementation details of the underlying databases. At the same time, ShardingSphere directly throws exceptions for some unimplemented methods, and provides the WrapperAdapter class to execute the actual methods. As a result, users face the ShardingConnection interface directly, while the specific implementation is completed by the adapter and other related classes.