24 Current Situation Understanding the Latest Developments in Distributed Databases

24 Current Situation - Understanding the Latest Developments in Distributed Databases #

Hello, congratulations on making it to the last lecture of the course.

In the previous lecture, we discussed several techniques for implementing database middleware, including global unique primary keys, sharding strategies, and cross-shard queries. Among them, the most important one is distributed transactions, and I hope you have mastered it. In this final lecture, I will introduce to you the NewSQL database.

First, let me try to define NewSQL. It is a modern type of relational database that also has the scalability of NoSQL. It excels at providing high-performance read-write services in OLTP scenarios, while ensuring transaction isolation and data consistency. We can understand NewSQL as combining the scalability represented by NoSQL, which developed around the year 2000, with the relational model SQL and ACID transactions developed in the 1970s, in order to obtain a highly concurrent relational distributed database.

If we use a NewSQL database, we can interact with it using familiar SQL queries. I have already provided a detailed introduction to the advantages of SQL in Module 1. Using SQL allows existing SQL-based applications to switch directly from traditional relational databases to NewSQL databases without the need for major modifications (or just minor ones). In contrast, NoSQL databases generally use SQL variant languages or custom APIs, which would require a higher cost for users to switch to NoSQL databases.

There has been ongoing debate about the definition and scope of NewSQL. Some argue that databases like Vertica and Greenplum, which are targeted towards OLAP and have distributed characteristics, should also be included in the NewSQL category. However, the more widely accepted standards for NewSQL include:

Performing short read-write transactions, i.e. no blocking transaction operations.
Using indexes to query a portion of the dataset, without analyzing the entire dataset in a loaded data table.
Adopting the Sharded-Nothing architecture.
Highly concurrent transactions without locks.

Based on these characteristics, I summarize that a NewSQL database is an SQL relational database that adopts an innovative architecture transparently supporting sharding and high-concurrency transactions.

Please note that DistributedSQL is a special type of NewSQL, which can be deployed globally.

Now, I will provide a detailed introduction to NewSQL databases according to the key points in the definition I provided.

Innovative Architecture #

The use of innovative database architecture is a particularly notable feature of NewSQL databases. This new architecture generally does not rely on any legacy code, which is very different from my previous discussion in “Module 22 | Development and Limitations: Exploring Traditional Databases in the Distributed Field” on relying on traditional databases as compute storage nodes. Let’s take TiDB, a typical NewSQL database, as an example.

We can observe several innovative points in this diagram:

The storage engine does not use traditional databases. Instead, it uses a new type of LSM-based KV distributed storage engine. Some databases use completely in-memory storage engines, such as NuoDB.
Sharded-Nothing architecture. Both the underlying storage and the upper-level workloads are independently deployed.
High-performance concurrent transactions. TiDB implements high-performance optimistic transactions based on the Percolator algorithm.
Transparent sharding. TiDB implements automatic range-based sharding, allowing for elastic node scaling.
Automatic fault tolerance based on replication technology. It uses the Raft algorithm to achieve high availability data replication and automatic failover. It can be said that the intersection between NewSQL and traditional relational databases lies in SQL and ACID transactions, which ensures that user habits can be continued.

The innovative points described above have been detailed in the previous two modules. Here, I don’t know if you have noticed a problem: compared to using traditional databases to build distributed databases, the most obvious difference in NewSQL comes from the storage engine. In particular, a group of NewSQL databases led by Spanner, such as YugaByte DB, CockroachDB, and TiDB, all use LSM trees as storage engines, and they are all KV structures. What is the principle behind this choice?

When I introduced LSM, I mentioned that it can achieve high-performance writing and reading but sacrifices space. You may conclude from this that NewSQL databases, facing OLTP scenarios, choose LSM tree storage engines instead of B-tree-like storage engines in order to improve throughput. However, I have also mentioned that there are many methods to improve the throughput of B-trees. So this is not the key point.

I used to be confused about this issue too. After extensive research and discussions with the project team, I came to a “conclusion”: open-source NewSQL databases do not choose LSM trees, but rather RocksDB. Is the choice of RocksDB not because it is an LSM structure? The answer is no. Most open-source NewSQL databases that use RocksDB as the storage engine are interested in its performance and functionality. A reasonable inference can be made that if there is a B-tree storage engine that outperforms RocksDB in performance and functionality, the landscape of modern open-source NewSQL database storage engines would be different.

Don’t be surprised, this development trend actually represents a practical feature of IT technology. From the popularity of the TCP/IP protocol to the replacement of EJB by Java enterprise framework Spring, they all reflect this practicality. That is, truly successful technology must have practical value, which must surpass any perfect theory. This also inspires us not to rush to classify a distributed database when observing it because the criteria for judging NewSQL that I am about to introduce today are based on the summary of existing database features and cannot represent future development. We need to learn to master the core features of each type of database.

Now, I want to mention a rather special database, which is OceanBase. One major difference between OceanBase’s read and write operations and traditional databases is that OceanBase does not directly modify data blocks when writing, but instead opens up a new piece of incremental memory to store the changes in data. After multiple changes to the same record, the incremental blocks are organized as a linked list and these incremental modifications are kept in memory without being written to disk. When reading, OceanBase combines the earliest data block read into memory with the subsequent related incremental blocks to read out the data. This feature is actually similar to the in-memory table and data table of LSM trees, but OceanBase operates at a higher dimension.

After summarizing the architectural innovations, I will now introduce the management of sharding in NewSQL.

Transparent Sharding #

As we mentioned earlier, how the database middleware does sharding is by using a logical sharding key to perform sharding calculations and write different rows into different target databases. In this process, users need to be deeply involved. For example, users need to specify which key to use as the sharding key, the mapping rules between logical and physical tables, and sometimes even need to modify SQL, and so on. This is why we generally call the middleware sharding in explicit sharding mode, as compared to the Transparent Sharding provided by NewSQL databases.

As the name suggests, in Transparent Sharding, users do not need to specify any rules, and the data can be distributed across the entire cluster and automatically backed up. So how does NewSQL determine the sharding key? Let’s take TiDB as an example.

Data in a table is sharded based on regions in TiDB. Each region has multiple replicas, ensuring high data availability. Different tables will have different regions, rather than having the same tables in each database as in traditional sharding. So, how does TiDB determine which region a row of data should be stored in?

The first thing to determine is how rows are mapped to the KV structure: the key is composed of the table ID and row ID primary key. For example:

t[table_id]_r[row_id]

The value stores the entire row of data. We use this key to calculate which region this row of data should fall into.

TiDB uses a range strategy to divide the data. In the case of an index, the region responsible for the index will be divided based on the range of the index fields. After the key goes through a transformation, a number will be obtained, and multiple intervals will be divided based on the range, with each interval managed by a region. The benefits of range sharding have been discussed before, which are storage balance and access pressure balance. The reason is that range sharding has the opportunity to do better dynamic scheduling, which can adapt to various dynamic scenarios in real-time.

Although transparent sharding brings convenience to users, as a clever person, you may have noticed that this implicit sharding method is functionally incomplete compared to the method introduced in the previous section. The most obvious gap is the lack of cross-shard join functionality. It can be imagined that this type of NewSQL and database middleware with ER sharding have performance differences. For this situation, TiDB introduces Placement Rules to place related data tables in the same KV storage in order to achieve the effect of ER sharding.

Of course, with the gradual development of NewSQL, hash sharding has also been gradually introduced. We know that range sharding has hot spot issues. Although it can be mitigated by dynamically splitting and merging shards, it is ultimately not a fundamental solution to the problem. This problem becomes particularly prominent for data with strong order, such as time series. Hash sharding is an effective means to address this issue. For this reason, CockroachDB introduces hash sharding indexes to achieve scalability for serialized data. Let’s summarize. Transparent sharding usually targets the primary key, and the row ID is commonly chosen as the primary key. However, to achieve a complete sharding functionality, some user configuration operations are still necessary.

Now that we’ve addressed the sharding issue, let’s move on to the SQL layer challenges.

Distributed SQL #

Compared to NoSQL, NewSQL databases excel in their support for SQL. As I discussed with you in Module 1, SQL is an essential feature for a successful distributed database. We have already discussed the importance of SQL, but its implementation has always been considered challenging, primarily due to the following two aspects:

Non-standardization of SQL: Although we have standards like SQL99, in the industry, there is no popular database that fully adheres to the standard. Each database has its own dialect and unique features. Therefore, most NewSQL databases implement SQL semantics following the existing database dialect. For example, TiDB implements MySQL syntax, while CockroachDB implements PostgreSQL.
Declarative language: SQL is a more advanced language compared to Java, Go, and others with which we are familiar. It mainly expresses the desired result rather than instructing the database on how to achieve it. This introduces concepts such as execution plan optimization and presents challenges in implementing efficient query engines.

The above challenges are specific to traditional databases implementing SQL. However, for distributed databases, additional considerations are required to address the issues arising from data distribution. Additionally, network latency becomes an important factor to consider in query optimization.

For NewSQL databases, if the innovative architecture mentioned above is used, especially when using KV storage at the low level, one needs to consider how data and indexes are mapped to the underlying KV storage. I have already explained how data is mapped to KV: the key consists of the table ID and the row ID of the primary key, and the value stores the row data:

t[table_id]_r[row_id] => row_data

Regarding indexes, we need to differentiate between unique indexes and non-unique indexes.

For a unique index, we combine the table ID, index ID, and index value to form the key, and the value is the aforementioned row ID:

t[table_id]_i[index_id][index_value] => row_id

For a non-unique index, we need to include the row ID in the key, and the value is empty:

t[table_id]_i[index_id][index_value]_r[row_id] => null

Once the mapping problem at the low level is resolved, we need to focus on the execution layer. This involves a significant amount of technical details, which I won’t delve into. Instead, I will show you some challenges that NewSQL databases might face.

Firstly, there is the issue of correctness. Indeed, achieving correct SQL syntax itself is a challenge. SQL can accept user-defined tables and indexes, queries use names, while execution uses IDs, which introduces mapping challenges for correctness. One of the trickiest aspects is handling Null. It’s not just a challenge for NewSQL databases; traditional databases have also spent a long time grappling with Null. This is why it takes considerable time for an SQL-like database to reach a stable state through iterations. Next, let’s talk about performance. Performance issues are not only a concern for database developers, but also for DBAs and end users. If you work with databases frequently, you must have some understanding of SQL optimization. The reason behind this is the declarative nature of SQL.

Most NewSQL databases need to implement traditional database optimization techniques, which are generally divided into two categories: rule-based and cost-based. The former is based on the static analysis of SQL semantics and database characteristics, also known as logical optimization. Common techniques include column pruning, subquery optimization to join, predicate pushdown, TopN pushdown, and so on. The cost-based optimization requires the database to generate real-time statistical information to decide whether to query indexes, execute SQL statements in a certain order, etc.

In addition, optimization for distributed databases includes parallel execution, stream aggregation, and consideration of network factors for cost-based optimization. Therefore, optimizing SQL for distributed databases is more complex.

From the above discussion, it can be seen that the challenges faced by NewSQL database implementations in the field of SQL are much greater than those of traditional databases. However, due to the accumulation of SQL execution and optimization techniques over many years, a complete system has been formed. Therefore, the new generation of NewSQL databases can be built on this foundation. They stand on the shoulders of giants, and implementing complete SQL functionality will be very efficient.

Now that we have introduced the exploration of NewSQL databases in the field of SQL, let’s finally discuss the characteristics of high-performance transactions.

High-Performance Transactions #

The ability to handle transactions is the key to the widespread application of NewSQL. In the previous module, we discussed Spanner and Calvin, which are typical innovative transaction models. In this lecture, I will summarize several common patterns for transaction processing in NewSQL databases and provide examples to illustrate them.

The first classification is centralized transaction management and decentralized transaction management. The advantage of centralized transaction management is that it is easy to implement transaction isolation, especially serializable isolation, but this often comes at the cost of lower throughput. On the other hand, decentralized transaction management is suitable for building high-concurrency transactions, but it requires logical clocks for timestamping services and to ensure the correctness of resource competition.

So far, this is a fairly conventional understanding. However, we have introduced the Calvin transaction model. It is actually a centralized transaction processing pattern, but it has a significant advantage in terms of throughput in high-competition environments. The key is to reschedule the execution of transactions to eliminate competition and improve throughput. In addition to Calvin, VoltDB also adopts a similar principle. If you are interested, you can learn about it by yourself.

On the other hand, decentralized transaction solutions generally need to combine techniques such as MVCC, 2PC, and PaxosGroup. By combining them, it is possible to achieve snapshot isolation with serialization and ensure the high availability of each component during execution. In addition to providing high availability, the use of PaxosGroup plays an important role in partitioning the data, thereby reducing the competition for locks and improving the efficiency of concurrent transactions. This pattern was first introduced by Spanner, so it is generally referred to as the Spanner-like transaction model.

In the previous lecture, we introduced JDTX, developed by JD for ShardingSphere, which presents a different scenario. As database middleware cannot operate at the underlying database level, the transaction scheme is limited. However, JDTX adopts a similar approach to OceanBase. It firstly constructs an independent MVCC engine outside the database nodes, and querying the latest data requires a combination of the database nodes and the modified records in the MVCC engine to obtain the latest data. Data in MVCC will be asynchronously flushed to disk to ensure it is released. The JDTX model breaks the curse that middleware cannot achieve high-performance transaction models, opening up new possibilities.

As you can see, current NewSQL concurrent transaction processing technologies often use various extensively validated schemes. They can be likened to contemporary aircraft carriers. Although each component may not have innovative features, when they are combined, they achieve achievements that predecessors cannot reach.

Summary #

In this lecture, I introduced the definition of NewSQL databases and analyzed four key points for a NewSQL database in detail.

Innovative architecture: both the distributed system and the storage engine need innovation.
Transparent sharding: automatic control, liberation of users, and alignment with cloud-native.
Distributed SQL: the key to building commercially viable distributed databases.
High-performance transactions: the innovation base of NewSQL, the key difference between different types of NewSQL.

With this, we have completed the main content of our course.

NewSQL and DistributedSQL with global deployment capability are the direction of development for contemporary distributed databases. It can be said that all the knowledge we have introduced revolves around the NewSQL ecosystem. In the next extra section, I will introduce other types of distributed databases to help you expand your thinking.