02 What Kind of Apache Open Source Software Is the Top Level Project Sharding Sphere

02 What Kind of Apache Open-Source Software is the Top-Level Project ShardingSphere #

In this lesson, I will explain what kind of Apache open source software ShardingSphere is.

In the previous lesson, I analyzed in detail the forms of sharding and the solutions and representative frameworks for sharding architecture. As we can see, ShardingSphere implements both client-side sharding and proxy server components, and provides features related to distributed databases. As an excellent open source software, ShardingSphere’s achievements are not achieved overnight. Now let’s review the development process of ShardingSphere.

Development Process of ShardingSphere: From Sharding-JDBC to Apache Top-Level Project #

Speaking of the origin of ShardingSphere, we have to mention the Sharding-JDBC framework. This framework originated from an internal application framework of Dangdang.com and was officially open-sourced in early 2017. From Sharding-JDBC to Apache top-level project, the development of ShardingSphere has gone through different stages. Through the entire history of ShardingSphere’s development, we can see the evolution process of the timeline and milestone:

From the perspective of version releases, we can further outline the relationship between main versions and core functionalities:

Based on the growth trajectory of stars on GitHub, we can also reflect the development process of ShardingSphere from another angle:

Design Philosophy of ShardingSphere: Compatibility rather than Disruption #

For an open source middleware, achieving substantial development relies on community contributions and, to a large extent, its own design and development philosophy.

ShardingSphere’s positioning is very clear: it is a relational database middleware, not a completely new relational database. ShardingSphere believes that in the current landscape, relational databases still occupy a huge market share. Whenever it involves data persistence, relational databases are still the standard configuration of systems and the cornerstone of core businesses for various companies. It is hard to shake this reality in the foreseeable future. Therefore, ShardingSphere focuses more on compatibility and extension on the existing basis, rather than disruption. So, how does ShardingSphere achieve this?

ShardingSphere has built an ecosystem, which consists of a set of open source distributed database middleware solutions. According to the current plan, ShardingSphere consists of three independent products: Sharding-JDBC, Sharding-Proxy, and Sharding-Sidecar. The first two have been officially released, while Sharding-Sidecar is still in planning stage. We can analyze the design philosophy of ShardingSphere from these three products.

Sharding-JDBC #

ShardingSphere’s predecessor is Sharding-JDBC, which is the most mature component in the entire framework. Sharding-JDBC is positioned as a lightweight Java framework that provides extensibility services at the JDBC layer. We know that JDBC is a development specification that specifies a series of interfaces such as DataSource, Connection, Statement, PreparedStatement, ResultSet, etc. Major database vendors implement these interfaces to support their own JDBC specifications, making the JDBC specification widely adopted as the database access standard in the Java domain.

Based on this, Sharding-JDBC was initially designed to be fully compatible with the JDBC specification. The set of sharding operation interfaces exposed by Sharding-JDBC are completely consistent with the interfaces provided in the JDBC specification. Developers can use Sharding-JDBC to achieve sharding without needing to understand the complexity of sharding rules and processing logic, as Sharding-JDBC internally shields all of it. Clearly, this approach is naturally a highly compatible solution that provides developers with the simplest and most direct development support. We will discuss the compatibility between Sharding-JDBC and the JDBC specification in detail in the next lesson.

Diagram illustrating compatibility between Sharding-JDBC and the JDBC specification

In actual development, Sharding-JDBC provides services in the form of a JAR package. Developers can use this JAR package to directly connect to the database without the need for additional deployment and dependency management. When applying Sharding-JDBC, it’s important to note that Sharding-JDBC relies on a complete and powerful sharding engine:

Because Sharding-JDBC provides an API that is fully compatible with the JDBC specification, it can easily integrate with various components and frameworks that follow the JDBC specification. For example, database connection pool components like DBCP and C3P0, as well as ORM frameworks like Hibernate and MyBatis, can be seamlessly integrated with Sharding-JDBC. Of course, as an open-source framework that supports multiple databases, Sharding-JDBC supports mainstream relational databases such as MySQL, Oracle, and SQLServer.

Sharding-Proxy #

The Sharding-Proxy component in ShardingSphere serves as a transparent database proxy server, making it one specific implementation of the proxy server sharding scheme. In terms of design and implementation, Sharding-Proxy also takes compatibility into account.

The compatibility provided by Sharding-Proxy is first reflected in its support for heterogeneous languages. To support heterogeneous languages, Sharding-Proxy specifically encapsulates the database binary protocol and provides a proxy server component. Secondly, in terms of client components, Sharding-Proxy is compatible with various access clients that follow the MySQL and PostgreSQL protocols, such as Navicat and MySQL Command Client. Like Sharding-JDBC, Sharding-Proxy also supports multiple databases such as MySQL and PostgreSQL.

Next, let’s take a look at the overall architecture of Sharding-Proxy. For application programs, this proxy mechanism is completely transparent and can be used directly as MySQL or PostgreSQL:

To summarize, we can directly view Sharding-Proxy as a database that serves as a proxy for multiple databases behind it, shielding the complexity of multiple databases. At the same time, it can be seen that the operation of Sharding-Proxy also depends on a sharding engine that performs sharding operations and a governance component for managing databases.

Although Sharding-JDBC and Sharding-Proxy have different focuses, in fact, they can be used together, which means there is compatibility between these two components.

As mentioned earlier, we embed the Sharding-JDBC JAR package directly in the application program, which is suitable for business developers. On the other hand, Sharding-Proxy provides a static entry point and supports heterogeneous languages, making it suitable for middleware developers and operations personnel who need to manage sharded databases. With a shared underlying sharding engine and database governance functionality, Sharding-JDBC and Sharding-Proxy can be used together to accommodate different application scenarios and different developers:

Sharding-Sidecar #

The Sidecar design pattern has received more and more attention and adoption. The goal of this pattern is to connect various heterogeneous service components in a system and provide efficient service governance. ShardingSphere has also designed the Sharding-Sidecar component based on this pattern. Up until now, ShardingSphere has announced the plan for Sharding-Sidecar, but has not provided a specific implementation plan, which will not be discussed here. As a specific implementation of the Sidecar pattern, we can imagine that the role of Sharding-Sidecar is to act as a Sidecar proxy for all database accesses. This is also a compatibility design approach that organically connects distributed data access applications with databases through a decentralized and non-intrusive solution.

ShardingSphere’s Core Features: From Data Sharding to Governance and Integration #

After introducing the design philosophy of ShardingSphere, let’s focus on its core features and implementation mechanisms. Here, we divide ShardingSphere’s overall functionality into four parts: Infrastructure, Sharding Engine, Distributed Transaction, and Governance & Integration. These four parts also constitute the overall structure of this course on introducing ShardingSphere. Let’s introduce each part separately:

Infrastructure #

As an open-source framework, ShardingSphere also provides many components related to infrastructure in its architecture. These components are more closely related to its internal implementation mechanisms, and we will explain them in detail in the subsequent source code analysis. However, for developers, it can be understood that the micro-kernel architecture and distributed primary key are the core features of infrastructure components provided by the framework.

Micro-kernel architecture

ShardingSphere adopts the MicroKernel architecture pattern in its design to ensure high scalability of the system. The MicroKernel architecture consists of two components: the core system and plugins. Upgrading the system using the MicroKernel architecture only requires replacing the old plugins with new ones, without changing the entire system architecture:

In ShardingSphere, a large number of plugin interfaces have been abstracted, including the SQLParserEntry for SQL parsing, the ConfigCenter for configuration management, the ShardingEncryptor for data encryption, and the RegistryCenter interface for database governance. Developers can provide customized implementations based on these plugin definitions according to their needs and dynamically load them into the ShardingSphere runtime environment.

Distributed Primary Key

In the case of a local database, we can use the built-in auto-increment sequence provided by the database to generate primary keys. However, in a sharding scenario, when we need to migrate from a local database to a distributed database, we need to consider the global uniqueness of the primary keys across all databases. For this reason, we need to introduce a mechanism for distributed primary keys. ShardingSphere also provides an implementation mechanism for distributed primary keys, and the default algorithm used is SnowFlake.

Sharding Engine #

Regarding the sharding engine, ShardingSphere supports both data sharding and read-write splitting mechanisms.

Data Sharding

Data sharding is the core functionality of ShardingSphere, supporting standard operations for both vertical and horizontal sharding. Additionally, ShardingSphere provides extension points for sharding, allowing developers to customize their own sharding strategies based on their needs.

Read-write Splitting

Based on database sharding, ShardingSphere also implements read-write splitting mechanism based on database master-slave architecture. Moreover, this read-write splitting mechanism can be seamlessly integrated with data sharding.

Distributed Transaction #

Distributed transaction is a fundamental feature that ensures data consistency in a distributed environment. As part of the ecosystem of distributed databases, ShardingSphere provides comprehensive support for distributed transactions.

Standardized Transaction Processing Interface

ShardingSphere supports local transactions, strong consistency transactions based on XA two-phase commit, and flexible eventual consistency transactions based on BASE. Additionally, ShardingSphere abstracts a set of standardized transaction processing interfaces and manages them uniformly through the sharding transaction manager, ShardingTransactionManager. We can also extend the distributed transactions by implementing our own ShardingTransactionManager based on our needs.

Strong Consistency Transactions and Flexible Transactions

ShardingSphere provides a set of built-in solutions for distributed transactions. The strong consistency transactions integrate technologies such as Atomikos, Narayana, and Bitronix to implement XA transaction managers. On the other hand, ShardingSphere internally integrates Seata to provide flexible transaction functionality.

Governance and Integration #

For distributed databases, governance encompasses a wide range of functions, and ShardingSphere provides a series of features such as registry center and configuration center to support database governance. Additionally, as an open-source framework for rapid development, ShardingSphere seamlessly integrates with other mainstream frameworks.

Data Desensitization

Data desensitization is a common requirement for ensuring data access security. The usual practice is to rewrite the original SQL to encrypt the original data. When we want to retrieve the original data, we need to decrypt the ciphertext data stored in the database. We can implement a similar encryption and decryption mechanism as needed. However, the strength of ShardingSphere lies in the fact that it embeds this mechanism into the process of executing SQL, so business developers do not need to be concerned with specific encryption and decryption implementation details, but merely need to achieve automatic data desensitization through simple configuration.

Configuration Center

For managing configuration information, we can maintain configuration information based on YAML or XML format configuration files, which are supported in ShardingSphere. Furthermore, ShardingSphere provides a mechanism for dynamic management of configuration information, supporting dynamic switching of data sources, tables, sharding, and read-write splitting strategies.

Registry Center

Compared to the configuration center, the registry center has a wider range of applications in ShardingSphere. The registry center in ShardingSphere provides two implementation methods based on Nacos and ZooKeeper. In terms of application scenarios, we can use the registry center for database instance management, database circuit breakers, and other governance functions.

Link Tracking

SQL parsing and SQL execution are the most critical steps in data sharding. ShardingSphere not only completes these two steps but also submits runtime data to the link tracking system through standard protocols. ShardingSphere uses the OpenTracing API to send performance tracking data. Specific products based on the OpenTracing protocol, such as SkyWalking, Zipkin, and Jaeger, can automatically integrate with ShardingSphere.

System Integration

System integration refers to the integration between ShardingSphere and Spring series frameworks. So far, ShardingSphere has implemented two mechanisms for system integration. One is the namespace mechanism, which integrates with the Spring framework by extending the Spring Schema; the other is to write custom starter components to integrate with Spring Boot. In this way, regardless of which Spring framework developers use, there is zero learning cost for using ShardingSphere.

Summary #

Starting today, we officially introduced ShardingSphere. This lesson reviewed the development history of ShardingSphere and explained its design ideas and concepts based on its three core products: Sharding-JDBC, Sharding-Proxy, and Sharding-Sidecar. Additionally, we outlined the core functionalities of ShardingSphere as a distributed database, including the sharding engine, distributed transactions, and governance and integration. We will provide detailed explanations of these functionalities in the subsequent lessons.

Here’s a question for you to ponder: ShardingSphere is a highly compatible open-source framework. In what specific aspects does its compatibility manifest?

Based on the introduction earlier, we know that ShardingSphere achieves sharding engine functionality by rewriting the JDBC specification, thus providing applications with a completely compatible usage method with JDBC. The next lesson will discuss the relationship between ShardingSphere and the JDBC specification in more detail.