35 Conclusion Summary and Prospects of Sharding Sphere

35 Conclusion Summary and Prospects of ShardingSphere #

Finally, we have reached the last lesson of this column. Today, we will summarize and look ahead to the entire ShardingSphere course. As a leading distributed database middleware in the industry, ShardingSphere is increasingly popular. It provides us with multiple core functions and helps us build complete solutions for database sharding and table splitting.

Firstly, let’s summarize the core functions of ShardingSphere explained in this column, and then review some of my thoughts and insights during the writing process. Finally, I will explain the evolutionary changes from ShardingSphere 4.X version to the future 5.X version.

Core Functions of ShardingSphere #

The ShardingSphere official website presents three core functions: data sharding, distributed transactions, and database governance. I have detailed explanations of these functions in Part 4, Part 5, and Part 6 of this column, which you can review.

1. Data Sharding #

Data sharding is the basic function of ShardingSphere. ShardingSphere supports general database sharding and table splitting based on vertical and horizontal split. On top of that, ShardingSphere also implements read-write splitting mechanism based on the database master-slave architecture, and this read-write splitting mechanism can integrate perfectly with data sharding.

On the other hand, as a highly scalable open-source framework, ShardingSphere also provides sharding extension points, allowing developers to implement customized development of sharding policies as needed.

2. Distributed Transactions #

Distributed transactions are used to ensure data consistency in distributed environments. This is the key function that sets ShardingSphere apart from ordinary database sharding and table splitting frameworks, and this function allows distributed transactions to be a type of distributed database middleware.

ShardingSphere’s support for distributed transactions is first reflected at the abstraction level. ShardingSphere abstracts a set of standardized transaction handling interfaces and manages them uniformly through the sharding transaction manager, ShardingTransactionManager. Moreover, we can also extend our own ShardingTransactionManager according to our needs to extend distributed transactions. In terms of transaction types, ShardingSphere supports both strong consistency and flexible transactions.

Once the data sharding and distributed transaction functions are available, we can implement daily database sharding and table splitting operations based on ShardingSphere. However, this is not enough, because we also need to track and monitor the database resources and the runtime state of the services in the system. Therefore, ShardingSphere also provides a variety of technical systems to help us with database governance.

3. Database Governance #

If you have been studying our column, I believe you already know that the main way to use ShardingSphere is to utilize its configuration system. Regarding management of configuration information, we can complete the maintenance of configuration information based on configuration files, which is supported in ShardingSphere.

Furthermore, in ShardingSphere, it also provides a dynamic management mechanism for configuration information, which can support dynamic switching of data sources, tables, sharding, and read-write separation strategies. As for the currently running database instances in the system, we also need to manage them dynamically. In specific application scenarios, we can use a registration center to accomplish database instance management, database circuit-breaking, and other governance functions.

Once ShardingSphere is applied to the production environment, both developers and operations personnel need to pay attention to the execution status of the SQL statements executed through ShardingSphere, as well as the runtime state of the ShardingSphere kernel. In ShardingSphere, OpenTracing API is used to send performance trace data. In the core stages of SQL parsing and execution, ShardingSphere collects runtime data and submits it to the distributed tracing system through standard protocols for analysis and monitoring.

The last core function of database governance is data desensitization. Strictly speaking, data desensitization is more of a specific function oriented towards business scenarios than a database governance function. Data desensitization is a common requirement for ensuring data access security in business systems. We need to encrypt the original data and store it in the database. When users query data, it retrieves the ciphertext data from the database, decrypts it, and finally returns the decrypted original data to the users.

ShardingSphere automates and transparentizes this data desensitization process, so developers do not need to focus on the implementation details of data desensitization.

Summary of the ShardingSphere Course #

After summarizing the core functions of ShardingSphere introduced, let’s summarize the features of the content explained in this column and its differentiation from other columns. Here, I divide it into the following three highlights.

One of the highlights of this column is that it provides complete sample code to introduce the above functions in ShardingSphere.

The case system in this example is simple enough for you to understand and master the various knowledge points from scratch. At the same time, this case system is comprehensive enough that it covers various core functions. We provide relevant configuration items and sample code for reference in your daily development process. The core content of this column is the source code analysis of ShardingSphere, which occupies 2/3 of the entire column and can be said to be the essence of the course.

On one hand, we present the microkernel architecture, as well as the design concept and implementation method of distributed primary keys. More importantly, we provide detailed design ideas and implementation mechanisms based on the source code for each core function introduced in ShardingSphere.

On the other hand, regarding data sharding, we analyze the parsing engine, routing engine, rewriting engine, execution engine, merge engine, and read-write separation involved. For distributed transactions and database governance, we also analyze the underlying principles of various technical components based on their application scenarios, ensuring that you not only know the result but also understand the reason behind it.

This column also has a highlight in terms of design, which is the content of “from source code analysis to daily development” in each lesson of the source code analysis.

A major goal of this course is to help you gain a deep understanding of the implementation principles of ShardingSphere by systematically explaining the framework source code. However, this is not the only goal. I hope you can also gain practical skills from it and make good use of what you have learned.

Therefore, in the “from source code analysis to daily development” part of each lesson, I will provide several engineering practices based on the content of that lesson. Some of these engineering practices are the extraction of design ideas, some are application skills of tool frameworks, and some are template codes that can be directly applied to business development processes.

Through learning this excellent open-source framework ShardingSphere, I hope that you can master the methods and techniques of system architecture design and implementation, and apply these engineering practices to your daily development work.

From ShardingSphere 4.X to 5.X #

Finally, let’s look forward to the development of ShardingSphere.

This course applies ShardingSphere 4.X, while the development team of ShardingSphere is currently working intensively on the 5.X version. The 5.X version is a major version in the development process of ShardingSphere, and the internal functions involved and its external API will face large-scale optimization and adjustments.

At the same time, the 5.X version has also added multiple new core functions, making the ShardingSphere ecosystem more rich. So far, the 5.X version has designed and implemented multiple core functions, including elastic scaling and shadow library pressure testing, among others. Let’s take a look at these two functions.

1. Elastic Scaling #

The first function to be introduced in the 5.X version is its elastic scaling function, with the corresponding module name being ShardingSphere-Scaling. With the rapid changes in business scale, we may need to dynamically scale the existing sharding cluster up or down. This process may seem simple, but the implementation is actually very complex.

Therefore, ShardingSphere provides a one-stop general solution. This solution supports various types of user-defined sharding strategies and reduces the repetitive work and business impact on data scaling and migration. The elastic scaling function was actually introduced in version 4.1.0 and has been in the alpha development stage, providing only basic scaling functions. In the subsequent plans, ShardingSphere plans to complete the entire function system through multiple milestone stages, including semi-automatic scaling, breakpoint resumable transmission, and fully automatic scaling.

2. Shadow Library Pressure Testing #

The second function introduced in the 5.X version is shadow library pressure testing, which is based on full-link testing of the system. At the database level, in order to ensure the reliability and integrity of production data, data isolation is needed to direct the data requests of pressure testing into the shadow library, to prevent the pressure testing data from being written into the production database and causing contamination of real data.

In ShardingSphere, we can use the data routing function to route the SQL that needs to be executed for pressure testing to the corresponding data source. Similar to data desensitization, ShardingSphere implements shadow library pressure testing by configuring a shadow rule.

In addition, ShardingSphere is also planning and implementing strong consistency and multi-replica functions. Let’s look forward to the release of these functions.

As the first systematic column about ShardingSphere in the industry, “In-depth Explanation of ShardingSphere Core Principles” condenses my years of practical experience in data sharding and governance based on ShardingSphere. The entire column has gone through half a year of pre-production, launch, and online processes. Along with this process, I systematically reviewed the source code of ShardingSphere and summarized the internal design ideas and implementation principles in detail.

Overall, ShardingSphere is a high-quality open-source framework, especially its compatibility with the JDBC specification, the phased execution process of the sharding engine, and various auxiliary service orchestration and governance functions, which have benefited my work for a long time.

I believe that these valuable “knowledge assets” can accompany you and make your career go further and wider. Finally, I wish everyone success in their respective positions!