14 Distributed Main Key What Are the Methods for Implementing Distributed Primary Keys in Sharding Sphere

14 Distributed Main Key What are the Methods for Implementing Distributed Primary Keys in ShardingSphere #

In this lesson, I will explain the implementation methods of distributed primary keys in ShardingSphere.

In the traditional process of developing database software, automatic generation of primary keys is a basic requirement. Various databases also provide corresponding support for this requirement, such as MySQL’s auto-increment keys and Oracle’s auto-increment sequence. However, in a sharding scenario, the problem becomes a bit more complex. We cannot rely on auto-increment keys on a single instance to achieve globally unique primary keys across different data nodes. This is where the need for distributed primary keys arises. ShardingSphere, as an excellent open-source software for database sharding, also provides a mechanism for implementing distributed primary keys. Today, we will discuss the basic principles and implementation methods of this mechanism.

GeneratedKey Scheme in ShardingSphere #

Before introducing the specific implementation methods of distributed primary keys provided by ShardingSphere, let’s first discuss the abstract GeneratedKey scheme in the framework. This will help you understand the specific use cases and usage methods of distributed primary keys.

GeneratedKey in ShardingSphere #

GeneratedKey is not a concept created by ShardingSphere. If you are familiar with ORM frameworks like Mybatis, you will be familiar with it. In fact, we have already introduced the implementation method of embedding GeneratedKey in Mybatis in the article “Data Sharding: How to Implement Database Sharding, Table Sharding, Database Sharding + Table Sharding, and Forced Routing (Part 1)?”. Usually, we set the useGeneratedKeys and keyProperty attributes in the Mapper file of Mybatis as follows:

<insert id="addEntity" useGeneratedKeys="true" keyProperty="recordId">
    INSERT INTO health_record (user_id, level_id, remark)
    VALUES (#{userId,jdbcType=INTEGER}, #{levelId,jdbcType=INTEGER},
         #{remark,jdbcType=VARCHAR})
</insert>

When executing this insert statement, the returned object automatically includes the generated primary key value. Of course, this method can only work if the corresponding database itself supports auto-increment primary keys.

When we use the auto-generated key scheme provided by ShardingSphere, the development process and effects are exactly the same as described above. In ShardingSphere, a GeneratedKey class is also implemented. Please note that this class is located in the sharding-core-route project. Let’s first look at the getGenerateKey method provided by this class:

public static Optional<GeneratedKey> getGenerateKey(final ShardingRule shardingRule, final TableMetas tableMetas, final List<Object> parameters, final InsertStatement insertStatement) {
    // Find the auto-increment column
    Optional<String> generateKeyColumnName = shardingRule.findGenerateKeyColumnName(insertStatement.getTable().getTableName());
    if (!generateKeyColumnName.isPresent()) {
        return Optional.absent();
    }

    // Check if the auto-increment class has generated the primary key value
    return Optional.of(containsGenerateKey(tableMetas, insertStatement, generateKeyColumnName.get())
            ? findGeneratedKey(tableMetas, parameters, insertStatement, generateKeyColumnName.get()) : createGeneratedKey(shardingRule, insertStatement, generateKeyColumnName.get()));
}

The logic of this code is to first find the Column corresponding to the primary key from ShardingRule, and then determine if the primary key already exists: if it does, find the primary key, otherwise generate a new primary key. Today, our focus is on the generation of distributed primary keys, so let’s directly go to the createGeneratedKey method:

private static GeneratedKey createGeneratedKey(final ShardingRule shardingRule, final InsertStatement insertStatement, final String generateKeyColumnName) {
    GeneratedKey result = new GeneratedKey(generateKeyColumnName, true);
    for (int i = 0; i < insertStatement.getValueListCount(); i++) {
        result.getGeneratedValues().add(shardingRule.generateKey(insertStatement.getTable().getTableName()));
    }
    return result;
}

In GeneratedKey, there is a variable called generatedValues of type LinkedList, which is used to save the generated primary keys. However, the actual generation of primary keys is delegated to the generateKey method in ShardingRule. Let’s go to the ShardingRule class and find this generateKey method:

public Comparable<?> generateKey(final String logicTableName) {
    Optional<TableRule> tableRule = findTableRule(logicTableName);
    if (!tableRule.isPresent()) {
        throw new ShardingConfigurationException("Cannot find strategy for generate keys.");
    }

    // Get ShardingKeyGenerator from TableRule and generate the distributed primary key
    ShardingKeyGenerator shardingKeyGenerator = null == tableRule.get().getShardingKeyGenerator() ? defaultShardingKeyGenerator : tableRule.get().getShardingKeyGenerator();
    return shardingKeyGenerator.generateKey();
}

First, based on the logicTableName passed in, find the corresponding TableRule, and based on the TableRule, find the ShardingKeyGenerator it contains. Then, generate the primary key using the generateKey of ShardingKeyGenerator. From the perspective of design patterns, ShardingRule is just a facade class, and the process of creating ShardingKeyGenerator should be in the TableRule. And the ShardingKeyGenerator here is obviously the real entry point for generating distributed primary keys. Let’s take a look at it.

ShardingKeyGenerator #

Next, let’s analyze the ShardingKeyGenerator interface. From its definition, the interface inherits the TypeBasedSPI interface:

public interface ShardingKeyGenerator extends TypeBasedSPI {
    Comparable<?> generateKey();
}

In the TableRule class, in one of its constructors, the creation process of ShardingKeyGenerator is found:

```java
shardingKeyGenerator = containsKeyGeneratorConfiguration(tableRuleConfig)
                ? new ShardingKeyGeneratorServiceLoader().newService(tableRuleConfig.getKeyGeneratorConfig().getType(), tableRuleConfig.getKeyGeneratorConfig().getProperties()) : null;

There is a ShardingKeyGeneratorServiceLoader class here, which is defined as follows:

public final class ShardingKeyGeneratorServiceLoader extends TypeBasedSPIServiceLoader<ShardingKeyGenerator> {

    static {
        NewInstanceServiceLoader.register(ShardingKeyGenerator.class);
    }

    public ShardingKeyGeneratorServiceLoader() {
        super(ShardingKeyGenerator.class);
    }
}

Looking back at the previous lesson, it is not difficult to understand the purpose of the ShardingKeyGeneratorServiceLoader class. The ShardingKeyGeneratorServiceLoader class extends the TypeBasedSPIServiceLoader class and registers all ShardingKeyGenerator classes in the classpath in its static method using the NewInstanceServiceLoader.

Creating a new ServiceLoader class by inheriting the TypeBasedSPIServiceLoader class and then registering the corresponding SPI implementation in its static method is a common practice in ShardingSphere when applying the microkernel pattern. Similar handling methods can be seen in many places.

In the META-INF/services directory of the sharding-core-common project, we see the specific SPI definitions:

Distributed primary key SPI configuration

It can be seen that there are two ShardingKeyGenerators here, namely SnowflakeShardingKeyGenerator and UUIDShardingKeyGenerator, both of which are located in the org.apache.shardingsphere.core.strategy.keygen package.

Distributed Primary Key Implementation in ShardingSphere #

In ShardingSphere, there are several implementations of the ShardingKeyGenerator interface. In addition to the above-mentioned SnowflakeShardingKeyGenerator and UUIDShardingKeyGenerator, there are also the LeafSegmentKeyGenerator and LeafSnowflakeKeyGenerator classes, but the implementation process of these two classes is somewhat special, which we will discuss later.

UUIDShardingKeyGenerator #

Let’s first take a look at the simplest ShardingKeyGenerator, which is the UUIDShardingKeyGenerator. The implementation of UUIDShardingKeyGenerator is very easy to understand. It directly uses the UUID.randomUUID() method to generate the distributed primary key:

public final class UUIDShardingKeyGenerator implements ShardingKeyGenerator {

    private Properties properties = new Properties();

    @Override
    public String getType() {
        return "UUID";
    }

    @Override
    public synchronized Comparable<?> generateKey() {
        return UUID.randomUUID().toString().replaceAll("-", "");
    }
}

SnowflakeShardingKeyGenerator #

Now let’s take a look at the SnowFlake algorithm, which is the default distributed primary key generation strategy in ShardingSphere. SnowFlake is a distributed ID generation algorithm open-sourced by Twitter. Its core idea is to use a 64-bit long number as a globally unique ID, which introduces a timestamp and can basically be kept incremental. The SnowFlake algorithm is widely used in distributed systems, and there are certain specifications for the structure of the 64-bit ID:

64-bit ID structure diagram

In the above diagram, we divide the 64 bits into four parts:

Sign Bit

The first part is the first bit, which has a value of 0 and has no actual significance.

Timestamp Bit

The second part is 41 bits representing the timestamp. A 41-bit timestamp can accommodate a number of milliseconds equal to 2 to the power of 41, which is equivalent to the number of milliseconds in a year: 365 * 24 * 60 * 60 * 1000, or 69.73 years. In other words, ShardingSphere’s SnowFlake algorithm has a time epoch starting from November 1, 2016, and can be used until the year 2086, which is believed to satisfy the requirements of most systems.

Worker Process Bit

The third part is 10 bits representing the worker process. The first 5 bits represent the machine’s room ID, while the last 5 bits represent the machine’s ID.

Sequence Bit

The fourth part is 12 bits representing the sequence number, which is the ID generated within one millisecond on a machine in a specific room. If the number of IDs generated within this millisecond exceeds 4096 (2 to the power of 12), the generator will wait until the next millisecond to continue generating.

Because the SnowFlake algorithm relies on timestamps, it also needs to consider the scenario of clock rollback. Clock rollback refers to the situation where the clock of a server is rolled back to a past time due to time synchronization. Obviously, a rollback of the timestamp will result in generating an ID that has already been used. Therefore, the default distributed primary key generator of ShardingSphere provides a maximum tolerable clock rollback in milliseconds. If the clock rollback exceeds the threshold of the maximum tolerable milliseconds, the program will throw an error. If it falls within the tolerable range, the default generator will wait for the clock to synchronize with the time of the last key generation before continuing. The default value of the maximum tolerable clock rollback in ShardingSphere is 0, which can be configured through properties.

After understanding the basic concept of the SnowFlake algorithm, let’s take a look at the specific implementation of the SnowflakeShardingKeyGenerator class. First, there are a batch of constant definitions in the SnowflakeShardingKeyGenerator class that maintain the relationships between different bits in the SnowFlake algorithm. There is also a TimeService used to obtain the current timestamp. The core method, generateKey, is responsible for generating the concrete ID. Here is the detailed code with comments for each line:

@Override
public synchronized Comparable<?> generateKey() {
    // Get the current timestamp
    long currentMilliseconds = timeService.getCurrentMillis();
    
    // If a clock rollback occurs, throw an exception or wait for clock synchronization
    if (waitTolerateTimeDifferenceIfNeed(currentMilliseconds)) {
        currentMilliseconds = timeService.getCurrentMillis();
    }
    
    // If the last generation time is the same as the current time
    if (lastMilliseconds == currentMilliseconds) {
        // This bitwise operation ensures that the sequence is always within the range of 4096, avoiding a sequence value that is greater than 4096 passed in by yourself
        if (0L == (sequence = (sequence + 1) & SEQUENCE_MASK)) {
            // If the bitwise operation result is 0, we need to wait for the next millisecond to continue generating
            currentMilliseconds = waitUntilNextTime(currentMilliseconds);
        }
    } else { // If they are different, generate a new sequence
        vibrateSequenceOffset();
        sequence = sequenceOffset;
    }
    lastMilliseconds = currentMilliseconds;
    
    // Shift the current timestamp to the left to fill the 41 bits, then shift the worker process to the left to fill the 10 bits, and finally place the sequence into the last 12 bits
    // Finally concatenate them into a 64-bit binary number
    return ((currentMilliseconds - EPOCH) << TIMESTAMP_LEFT_SHIFT_BITS) | (getWorkerId() << WORKER_ID_LEFT_SHIFT_BITS) | sequence;
}

As we can see, the implementation comprehensively considers elements such as clock rollback and requests within the same millisecond to complete the specific implementation of the SnowFlake algorithm.

LeafSegmentKeyGenerator and LeafSnowflakeKeyGenerator #

In fact, it is quite difficult to implement a ShardingKeyGenerator similar to SnowflakeShardingKeyGenerator, and it is also a case of reinventing the wheel. Therefore, although ShardingSphere also provided the complete implementation classes LeafSegmentKeyGenerator and LeafSnowflakeKeyGenerator of ShardingKeyGenerator in version 4.X, these two implementation classes have been removed in the ongoing development of version 5.X.

Currently, ShardingSphere specifically provides the OpenSharding repository to store the new versions of LeafSegmentKeyGenerator and LeafSnowflakeKeyGenerator. The new version implementation classes directly adopt the Leaf open-source implementation provided by Meituan, a third-party company.

Leaf provides two ways to generate IDs: segment mode and Snowflake mode introduced earlier. Regardless of which mode is used, a leaf.properties file needs to be provided and the corresponding configuration options need to be set. Regardless of the mode used, the application needs to set a leaf.key:

# for keyGenerator key
leaf.key=sstest

# for LeafSnowflake

leaf.zk.list=localhost:2181

If the segment mode is used, it relies on a database table to store runtime data, so you need to add the relevant database configuration to the leaf.properties file:

# for LeafSegment 
leaf.jdbc.url=jdbc:mysql://127.0.0.1:3306/test?serverTimezone=UTC&useSSL=false 
leaf.jdbc.username=root 
leaf.jdbc.password=123456

Based on these configurations, we can create the corresponding DataSource and further create an IDGen implementation class for generating distributed IDs. Here, we create the SegmentIDGenImpl class based on the segment mode:

// Build a data source and set properties through DruidDataSource
DruidDataSource dataSource = new DruidDataSource();
                dataSource.setUrl(properties.getProperty(LeafPropertiesConstant.LEAF_JDBC_URL));
                dataSource.setUsername(properties.getProperty(LeafPropertiesConstant.LEAF_JDBC_USERNAME));
                dataSource.setPassword(properties.getProperty(LeafPropertiesConstant.LEAF_JDBC_PASSWORD));
dataSource.init();
              
// Build the database access Dao component
IDAllocDao dao = new IDAllocDaoImpl(dataSource);
// Create the IDGen implementation class
this.idGen = new SegmentIDGenImpl();
// Bind the Dao component to the IDGen implementation class
 ((SegmentIDGenImpl) this.idGen).setDao(dao);
this.idGen.init();
this.dataSource = dataSource;

Once we have successfully created the IDGen implementation class, we can generate the target ID using this class. The LeafSegmentKeyGenerator class contains all the implementation details:

Result result = this.idGen.get(properties.getProperty(LeafPropertiesConstant.LEAF_KEY));
return result.getId();

After introducing LeafSegmentKeyGenerator, let’s take a look at LeafSnowflakeKeyGenerator. The implementation of LeafSnowflakeKeyGenerator relies on the distributed coordination framework ZooKeeper, so the target address of ZooKeeper needs to be specified in the configuration file:

# for LeafSnowflake
leaf.zk.list=localhost:2181

Creating the IDGen implementation class SnowflakeIDGenImpl for LeafSnowflake is relatively simple. We can directly set the ZooKeeper address in the constructor:

IDGen idGen = new SnowflakeIDGenImpl(properties.getProperty(LeafPropertiesConstant.LEAF_ZK_LIST), 8089);

Similarly, the way to obtain the template ID through IDGen is the same:

idGen.get(properties.getProperty(LeafPropertiesConstant.LEAF_KEY)).getId();

Clearly, the implementation of distributed ID generation in both segment mode and Snowflake mode in the Leaf framework is very simple. The Leaf framework shields us from the complexity of internal implementation.

From Source Code Analysis to Daily Development #

Compared to other architectural design ideas and implementation solutions in ShardingSphere, distributed primary keys are quite independent. Therefore, the various implementation methods of distributed primary keys introduced today can be directly applied to daily development processes. Whether it’s the SnowflakeShardingKeyGenerator implemented by ShardingSphere itself, or the LeafSegmentKeyGenerator and LeafSnowflakeKeyGenerator implemented based on third-party frameworks, they provide us with direct solutions for using distributed primary keys. Of course, we can further explore other similar solutions based on these implementation methods.

Summary #

In the development process of distributed systems, distributed primary keys are a basic requirement. When it comes to operations related to databases, we often need to associate distributed primary keys with the primary key auto-generation mechanism of the database. In today’s lesson, we started with the auto-generated key scheme of ShardingSphere and introduced various implementation methods of distributed primary keys. This includes the simplest UUID, the classic Snowflake algorithm, as well as the improved Snowflake algorithm and LeafSegment and LeafSnowflake algorithms.

Here’s a question for you to think about: How does ShardingSphere implement the Leaf for number-based segment and Leaf for Snowflake to generate distributed IDs respectively?

Starting from the next lesson, we will enter the explanation process of the implementation principles of the ShardingSphere sharding engine. I will first introduce the execution flow of the parsing engine. Remember to attend the class on time.