00 How to Correctly Learn an Open Source Framework for Database Sharding and Table Splitting

00 How to Correctly Learn an Open-Source Framework for Database Sharding and Table Splitting #

Hello, I’m Xiaoran. I have been involved in the construction and optimization of distributed systems for a long time. I have been responsible for the design and development of large-scale e-commerce and Internet of Things systems. I have led a team to complete the construction of a leading IoT data platform and have rich practical experience in data sharding and governance using ShardingSphere.

The rapid development of the Internet has brought massive amounts of information data and also more technical challenges. Taking the IoT industry I have been working in for many years as an example, various smart terminal devices (such as cameras or car devices, etc.) have been reporting millions of business data per day, not to mention the Internet industry such as e-commerce and social media. Such large-scale data processing can no longer be supported by traditional relational database’s single-database single-table architecture. How to efficiently store and access this data has become a very real and urgent problem.

However, due to the maturity of the ecosystem, relational databases are still the cornerstone of the core business of data platforms and have a huge market. Although there are a number of NoSQL databases in the industry that can naturally integrate features such as distributed sharding, they do not have core functions such as transaction management.

In the face of the growing amount of data in the system, the common practice in the industry is to introduce a sharding and splitting architecture. We can integrate the design methods of vertical database sharding and horizontal table splitting to cope with the storage and access of massive data.

ShardingSphere: Making Sharding and Splitting Architecture a Reality #

To achieve sharding and splitting architecture that supports storage and access of massive data, aside from planning and designing at the business level, developers also face a series of technical implementation problems, such as:

Data sharding: how to achieve sharding and splitting operations of relational databases with minimal cost?
Proxy mechanism: how to access the data in the sharding and splitting architecture based on ordinary client tools?
Distributed transactions: how to ensure the consistency of the same business data distributed in different databases and tables?
Database governance: how to ensure the consistency of scattered database resources such as data sources and configuration information in various environments?

As a distributed database middleware, ShardingSphere is a “weapon” for sharding and splitting architecture that can solve these pain points, and it has several advantages compared to other sharding and splitting frameworks (such as Cobar and MyCat):

Technical authority: it is the first distributed database middleware project in the history of the Apache Foundation, representing the latest technology direction in this field.
Solution completeness: it integrates client sharding, proxy servers, and the core functions of distributed databases, providing a complete open-source solution and ecosystem for distributed database middleware that is suitable for Internet application architecture and cloud service architecture.
Developer-friendly: it provides a friendly integration method. Business developers only need to introduce a JAR package to embed functions such as data sharding, read-write separation, distributed transactions, and database governance into their business code.
Pluggable system scalability: many core functions are provided in the form of plugins, allowing developers to customize their own unique systems by arranging and combining them.

These excellent features have made ShardingSphere the leader in the field of sharding and splitting middleware, and it is being used by more and more well-known companies (such as JD.com, Dangdang, Telecom, Zhongtong Express, Bilibili, etc.) to build their powerful and robust data platforms. If you are struggling to find a mature and stable sharding and splitting middleware, then ShardingSphere can help you solve this pain point.

Why do you need to learn this course? #

Any enterprise dealing with massive data processing must use sharding and splitting. How to design and migrate sharding and splitting for massive data, and how to effectively store and access massive business data, has become a major topic that many architects and developers need to plan and implement, and it has also become a high-demand job position in many high-quality companies like Pinduoduo, Qutoutiao, and Aikucun.

However, there is a severe shortage of high-quality talents. Firstly, engaging in massive data processing requires corresponding application scenarios and high technical thresholds. Secondly, the industry lacks mature frameworks to meet practical requirements. Therefore, mastering mainstream sharding and splitting and distributed database middleware frameworks like ShardingSphere has become the goal of major companies to compete for talents.

Given the lack of systematic introductions to ShardingSphere in the market, I hope to fill this gap. In addition, although the concept of sharding and splitting is relatively simple, it is not easy to implement it in actual development and requires a systematic and step-by-step learning process.

Course Design #

This course consists of 6 major parts and is based on the ShardingSphere open-source framework. It introduces mainstream solutions and engineering practices for sharding and partitioning and is the first comprehensive course that systematically introduces the core features and implementation principles of ShardingSphere, filling a gap in the industry.

Part 1: Introduction to ShardingSphere. This part will start by explaining how to correctly understand sharding and partitioning architecture, introduce the relationship between JDBC specifications and ShardingSphere, and explain various specific ways to use ShardingSphere in business systems based on the configuration system provided by ShardingSphere.
Part 2: Core Features of ShardingSphere. ShardingSphere includes many feature capabilities. This part will provide specific usage methods and development techniques for core features such as data sharding, read-write splitting, distributed transactions, data desensitization, orchestration and governance.

Parts 3 to 6 are the focus of the course, delving into the core architecture of ShardingSphere from different perspectives, providing insights into the design and implementation mechanisms of sharding and partitioning at the source code level, and helping you improve your understanding of the source code.

Part 3: ShardingSphere Source Code Analysis - Infrastructure. This part will discuss the basic infrastructure of ShardingSphere, starting with an efficient way to read the source code, and introducing the design concepts of micro-kernel architecture and distributed primary keys, as well as the specific implementation methods in ShardingSphere.
Part 4: ShardingSphere Source Code Analysis - Sharding Engine. This part will focus on the core implementation principles of the sharding engine in ShardingSphere, starting from the SQL parsing engine, and providing a source code analysis of various core technology points in the sharding engine, including routing engine, rewriting engine, execution engine, and merging engine.
Part 5: ShardingSphere Source Code Analysis - Distributed Transactions. Distributed transactions are a necessary feature of distributed database middleware, and ShardingSphere also provides an abstraction for distributed transactions. I will analyze this abstraction process in detail, as well as how to implement strong consistency transactions and flexible transactions.
Part 6: ShardingSphere Source Code Analysis - Governance and Integration. The final part will discuss issues related to database governance, such as implementing non-intrusive data desensitization solutions based on the rewriting engine, achieving dynamic management of configuration information based on the configuration center, implementing database access circuit-breaking mechanism based on the registry center, and implementing data access tracing based on the Hook mechanism and OpenTracing protocol. I will provide detailed answers to these issues.

In addition, for the core feature part of the course, I have analyzed specific cases and provided detailed code implementations and configuration solutions for your learning and transformation. You can download the course code from https://github.com/tianyilan12/shardingsphere-demo.

What You Will Get #

Application methods and implementation principles of sharding and partitioning. This course will help you understand the core features of ShardingSphere to meet the needs of daily development work, and provide the design principles and implementation mechanisms of these features based on the source code.
Learning excellent open-source frameworks and improving technical understanding and application capabilities. Technical principles have common features. Taking ZooKeeper as an example, a distributed coordination framework, both ShardingSphere and Dubbo use it to build a registry center:

In ShardingSphere, we can use the dynamic listening mechanism provided by ZooKeeper to determine whether a database instance is available, whether a database instance needs to be accessed for data circuit-breaking, and to achieve dynamic management of configuration information in a distributed environment.

As you delve into ShardingSphere, you will find many similar examples, including micro-kernel architecture based on the SPI mechanism, distributed primary keys based on the snowflake algorithm, configuration center based on Apollo, registry center based on Nacos, flexible transactions based on Seata, and data access tracing based on the OpenTracing specification.

These technical systems are also reflected in mainstream development frameworks such as Dubbo and Spring Cloud. Therefore, in addition to strengthening your systematic understanding of these technical systems, this course will also enable you to master the specific application scenarios and implementation methods of these technical systems, thereby achieving transferable knowledge.

Learning techniques from source code analysis to daily development. Going from source code analysis to daily application is a core goal of this course. Based on the excellent open-source framework ShardingSphere, a series of development techniques can be derived, including the application of design patterns (such as factory pattern, strategy pattern, template method, etc.), micro-kernel architecture and architectural patterns, component design and class structure division, common cache applications and custom cache mechanisms, integration and alignment with the Spring framework, and other development techniques. These development techniques can be directly applied to the daily development process.

Instructor’s Message #

Technology is advancing rapidly, and with the popularity of architectural design concepts such as data hubs and various AI applications, the increasing volume of data has become a major challenge for most software systems. Mature and actively developing sharding and partitioning frameworks are not many, and companies have limited choices. ShardingSphere is currently the only Apache top-level project in this field and provides the richest set of core features, representing a direction of technological development in this field. I hope this course can help you learn ShardingSphere well and master the learning methods of transferable knowledge.

Finally, feel free to share your experiences and expertise in data processing and architectural design in the comments section. I hope you can gain the desired benefits from this course. Let’s work together!