33 Chain Tracking How to Implement Data Access Chain Tracking Based on Hook Mechanism and Open Tracing Protocol

33 Chain Tracking How to Implement Data Access Chain Tracking Based on Hook Mechanism and OpenTracing Protocol #

Today, let’s discuss another topic related to orchestration and governance in ShardingSphere, which is link tracking. In the development process of distributed systems, link tracking is a basic infrastructure feature. As a middleware for distributed databases, ShardingSphere also provides a built-in, simple yet complete link tracking mechanism.

Before diving into the specific implementation process, let’s first understand some theoretical knowledge about link tracking.

The principle of service tracking in a distributed environment is not complicated. We first need to introduce two basic concepts: TraceId and SpanId.

  • TraceId

TraceId refers to the tracking ID. In a microservice architecture, a globally unique ID is generated for each request, and this ID can be used to link the entire call chain. In other words, when a request is circulating within a distributed system, the system needs to ensure the uniqueness of the ID being passed along with the request until the request returns. This unique ID is called TraceId.

  • SpanId

In addition to TraceId, we also need SpanId, which is generally referred to as the span ID. When a request reaches various service components, the SpanId is used to identify the start, specific execution process, and end of the request. For each span, it must have both a start and an end node, and the start and end timestamps of the span are recorded to calculate its time delay.

During the entire call process, each request must transmit TraceId and SpanId. Each service takes the received SpanId as the parent SpanId and records it, and generates its own SpanId. A Span without a parent SpanId is considered a root Span and can be viewed as the entry point of the call chain. Therefore, to view a complete call, we only need to query all the call records based on the TraceId, and then organize the entire call hierarchy based on the parent SpanId and SpanId. In fact, there is a common link tracking protocol, called the OpenTracing protocol, which exists in the industry regarding how to build a unified relationship between Trace and Span.

2. OpenTracing Protocol and Application Methods #

OpenTracing is a protocol that uses similar terms as mentioned above to represent the process of link tracking. By providing a platform-independent, vendor-independent API, OpenTracing enables developers to easily add or replace the implementation of link tracking systems. Currently, mainstream programming languages such as Java, Go, and Python provide support for the OpenTracing protocol.

Let’s take the example of Java to introduce the application method of the OpenTracing protocol. The most important objects that have a correlation in the OpenTracing API are the Tracer and Span interfaces.

For the Tracer interface, the most important method is buildSpan, which is used to create a Span object based on a certain operation:

SpanBuilder buildSpan(String operationName);

From the above code, we can see that the buildSpan method actually returns a SpanBuilder object, and within the SpanBuilder, there is a group of overloaded withTag methods used to add a tag to the current Span. Tags can be used for user-defined purposes, such as retrieval and query marks, and they are a set of key-value pairs. One of the definitions of the withTag method is as follows:

SpanBuilder withTag(String key, String value);

We can add multiple tags to a Span, and once all the tags are added, we can call the start method to start the Span:

Span start();

Note that this method returns a Span object, and once we obtain the Span object, we can call the finish method to end the Span, which will automatically fill in the end time of the Span:

void finish();

Based on the introduction of the OpenTracing API above, in the daily development process, the common implementation method of embedding link tracking in business code can be abstracted by the following code snippet:

// Get the Tracer object from the OpenTracing-compliant implementation framework
Tracer tracer = new XXXTracer();

// Create a Span and start it
Span span = tracer.buildSpan("test").start();

// Add tags to the Span
span.setTag(Tags.COMPONENT, "test-application");

// Perform business logic

// Finish the Span
span.finish();

// Retrieve relevant information from the Span as needed
System.out.println("Operation name = " + span.operationName());
System.out.println("Start = " + span.startMicros());
System.out.println("Finish = " + span.finishMicros());

In fact, the approach of integrating the OpenTracing API in ShardingSphere is basically similar to the above method. Let’s take a look together.

For ShardingSphere, the framework itself is not responsible for how to collect, store, and display application performance monitoring data. It only sends the most core SQL parsing and execution information to the application performance monitoring system and lets the system handle it.

In other words, ShardingSphere is only responsible for generating valuable data and passing it to third-party systems through standard protocols, without further processing these data.

ShardingSphere uses the OpenTracing API to send performance tracking data. Any specific product that supports the OpenTracing protocol can automatically integrate with ShardingSphere, such as popular products like SkyWalking, Zipkin, and Jaeger. In ShardingSphere, using these specific products only requires configuring the implementor of the OpenTracing protocol during startup.

1. Get the Tracer Class through ShardingTracer #

In ShardingSphere, all the code related to link tracking is located in the sharding-opentracing project. Let’s take a look at the ShardingTracer class. The init method of this class initializes the implementation class of the OpenTracing protocol, as shown below:

public static void init() {
    // Get the configuration of the implementation class of the OpenTracing protocol from the environment variables
    String tracerClassName = System.getProperty(OPENTRACING_TRACER_CLASS_NAME);
    Preconditions.checkNotNull(tracerClassName, "Can not find opentracing tracer implementation class via system property `%s`", OPENTRACING_TRACER_CLASS_NAME);
    ...

Note: The remaining content hasn’t been translated yet.

try {

    // Initialize the implementation class of the OpenTracing protocol
    init((Tracer) Class.forName(tracerClassName).newInstance());

} catch (final ReflectiveOperationException ex) {

    throw new ShardingException("Initialize opentracing tracer class failure.", ex);

}

We use the configuration OPENTRACING_TRACER_CLASS_NAME to obtain the class name of the implementation class of the OpenTracing protocol, and then create an instance using reflection. For example, we can configure the class as the SkywalkingTracer class in the Skywalking framework as follows:

org.apache.shardingsphere.opentracing.tracer.class=org.apache.skywalking.apm.toolkit.opentracing.SkywalkingTracer

Of course, the ShardingTracer class also provides a method to initialize the OpenTracing protocol implementation class through direct injection. In fact, the init method above ultimately calls the overloaded init method as follows:

public static void init(final Tracer tracer) {

    if (!GlobalTracer.isRegistered()) {

        GlobalTracer.register(tracer);

    }

}

This method puts the Tracer object into the global GlobalTracer. GlobalTracer is a utility class provided by the OpenTracing API, which uses the singleton design pattern to store a global Tracer object. Its variable definition, register method, and get method are defined as follows:

private static final GlobalTracer INSTANCE = new GlobalTracer();

public static synchronized void register(final Tracer tracer) {

    if (tracer == null) {

        throw new NullPointerException("Cannot register GlobalTracer <null>.");

    }

    if (tracer instanceof GlobalTracer) {

        LOGGER.log(Level.FINE, "Attempted to register the GlobalTracer as delegate of itself.");

        return; // no-op

    }

    if (isRegistered() && !GlobalTracer.tracer.equals(tracer)) {

        throw new IllegalStateException("There is already a current global Tracer registered.");

    }

    GlobalTracer.tracer = tracer;

}



public static Tracer get() {

    return INSTANCE;

}

With this approach, initialization can be done using the following method:

ShardingTracer.init(new SkywalkingTracer());

And the method to obtain the specific Tracer object can directly call the same-named method of GlobalTracer, as shown below:

public static Tracer get() {

    return GlobalTracer.get();

}

2. Populating Spans Based on Hook Mechanism #

Once the Tracer object is obtained, we can use it to build various Spans. ShardingSphere uses the Hook mechanism to populate Spans. Speaking of the Hook mechanism, we can recall the relevant content in “Chapter 15 | Parsing Engine: What Core Stages Should SQL Parsing Process Include (Part 1)?” In the parse method of the SQLParseEngine class, ParseHook is used as follows:

public SQLStatement parse(final String sql, final boolean useCache) {

        // Monitor and trace based on the Hook mechanism
        ParsingHook parsingHook = new SPIParsingHook();
        parsingHook.start(sql);

        try {

           // Complete the parsing of the SQL and return an SQLStatement object
SQLStatement result = parse0(sql, useCache);

parsingHook.finishSuccess(result);

return result;

} catch (final Exception ex) {

parsingHook.finishFailure(ex);

throw ex;

}

Notice that in the above code, a SPIParsingHook is created and has implemented the ParsingHook interface. The definition of the ParsingHook interface is as follows:

public interface ParsingHook {

    // Hook when starting parsing

    void start(String sql);

    // Hook when parsing is completed successfully

    void finishSuccess(SQLStatement sqlStatement);

    // Hook when parsing fails

    void finishFailure(Exception cause);

}

The SPIParsingHook class is actually a container class that instantiates and calls all hooks of the same type through the SPI mechanism. The implementation of SPIParsingHook is as follows:

public final class SPIParsingHook implements ParsingHook {

    private final Collection<ParsingHook> parsingHooks = NewInstanceServiceLoader.newServiceInstances(ParsingHook.class);

    static {
        NewInstanceServiceLoader.register(ParsingHook.class);
    }

    @Override
    public void start(final String sql) {
        for (ParsingHook each : parsingHooks) {
            each.start(sql);
        }
    }

    @Override
    public void finishSuccess(final SQLStatement sqlStatement, final ShardingTableMetaData shardingTableMetaData) {
        for (ParsingHook each : parsingHooks) {
            each.finishSuccess(sqlStatement, shardingTableMetaData);
        }
    }

    @Override
    public void finishFailure(final Exception cause) {
        for (ParsingHook each : parsingHooks) {
            each.finishFailure(cause);
        }
    }

}

Here, we see the familiar NewInstanceServiceLoader utility class. This way, once we implement ParsingHook, the hook-related functionality will be embedded in the system’s execution flow when executing the parse method of the SQLParseEngine class.

Additionally, OpenTracingParsingHook also implements the ParsingHook interface, as shown below:

public final class OpenTracingParsingHook implements ParsingHook {

    private static final String OPERATION_NAME = "/" + ShardingTags.COMPONENT_NAME + "/parseSQL/";

    private Span span;

    @Override
    public void start(final String sql) {
         // Create span and set tags
        span = ShardingTracer.get().buildSpan(OPERATION_NAME)
                .withTag(Tags.COMPONENT.getKey(), ShardingTags.COMPONENT_NAME)
                .withTag(Tags.SPAN_KIND.getKey(), Tags.SPAN_KIND_CLIENT)
                .withTag(Tags.DB_STATEMENT.getKey(), sql).startManual();
    }

    @Override
    public void finishSuccess(final SQLStatement sqlStatement) {
         // Finish span when successful
        span.finish();
    }

    @Override
    public void finishFailure(final Exception cause) {
         // Finish span when failed
        ShardingErrorSpan.setError(span, cause);
        span.finish();
    }

}

We know that the Tracer class provides the buildSpan method to create custom spans and we can add custom tags through the withTag method. Finally, we can close the span with the finish method. In this method, we see the specific use cases of these methods.

Similarly, in Lesson 21, we also saw the use cases of the SQLExecutionHook interface in the execute0 method of the SQLExecuteCallback abstract class. The definition of the SQLExecutionHook interface is as follows:

public interface SQLExecutionHook {

    // Hook when starting SQL execution
    void start(String dataSourceName, String sql, List<Object> parameters, DataSourceMetaData dataSourceMetaData, boolean isTrunkThread, Map<String, Object> shardingExecuteDataMap);

    // Hook when SQL execution is completed successfully
    void finishSuccess();

    // Hook when SQL execution fails
    void finishFailure(Exception cause);

}

In ShardingSphere, there is also a complete system to implement this interface, including the SPISQLExecutionHook, which also acts as a container class similar to SPIParsingHook, and the OpenTracingSQLExecutionHook based on the OpenTracing protocol. The implementation process is the same as that of OpenTracingParsingHook, and we will not go into further details here.

From Source Code Analysis to Daily Development #

In today’s content, we have delved into the implementation process of link tracking in ShardingSphere. We found that there isn’t much code related to link tracking in ShardingSphere, so in order to better understand the implementation mechanism of link tracking, we also spent some space introducing the basic principles of link tracking and the core classes behind the OpenTracing specification.

Then, we found that ShardingSphere has built-in a set of hooks in the execution process of the business flow. These hooks can help the system collect various monitoring information, and they can be managed uniformly through various implementation classes of the OpenTracing specification.

Here are two development tips that can be used in daily development based on today’s content. If you need to implement your own link monitoring and analysis system for a distributed environment, the OpenTracing specification and corresponding implementation classes are your first choice.

Based on the OpenTracing specification, there are many excellent tools in the industry, such as SkyWalking, Zipkin, and Jaeger, which can be easily integrated into your business system.

On the other hand, we also saw the use cases and methods of the hook mechanism. Hooks are essentially a kind of callback mechanism, and we can extract the necessary hooks according to our needs, and dynamically load them into the system through SPI to meet different needs in different scenarios. ShardingSphere provides a good implementation reference for us on how to implement and manage hook implementation classes in the system.

Summary and Preview #

Today’s content revolves around the link tracking implementation process in ShardingSphere. We found that there is not much code about link tracking in ShardingSphere, so in order to better understand the implementation mechanism of link tracking, we also spent some space introducing the basic principles of link tracking and the core classes behind the OpenTracing specification.

Then, we found that ShardingSphere has built-in a set of hooks in the execution process of the business flow. These hooks can help the system collect various monitoring information and manage them uniformly through various implementation classes of the OpenTracing specification.

Here’s a question for you to think about: How does ShardingSphere integrate with the OpenTracing protocol?

In the next lesson, we will introduce the final topic of the ShardingSphere source code analysis section, which is how the ShardingSphere kernel seamlessly integrates with the Spring framework to reduce the learning curve for developers.