16 Dubbo Serialize Layer Multiple Serialization Algorithms There Is Always One for You

16 Dubbo Serialize Layer Multiple Serialization Algorithms There is Always One for You #

In the previous lessons, we learned that an RPC framework needs to achieve cross-JVM invocation through network communication. Since network communication is involved, serialization and deserialization techniques are inevitably used, and Dubbo is no exception. Now let’s start with the basics of Java serialization and introduce some common serialization algorithms. Finally, we will analyze how Dubbo supports these serialization algorithms.

Java Serialization Basics #

In Java, serialization operations generally involve the following four steps.

The first step is that the object to be serialized needs to implement the Serializable interface. Here’s an example code:

public class Student implements Serializable {

    private static final long serialVersionUID = 1L;

    private String name;

    private int age;

    private transient StudentUtil studentUtil;

}

In this example, you can see the transient keyword, which means that the member variable it modifies will be ignored during object serialization. Generally, it can be used to modify non-data fields and values that can be calculated from other fields. By using the transient keyword reasonably, the amount of data after serialization can be reduced and the network transmission efficiency can be improved.

The second step is to generate a serial version UID. This UID is not required, but it is still recommended to generate one. The literal meaning of serialVersionUID is the serialization version number. Only when the serialVersionUID of serialization and deserialization is the same can deserialization be successful. If the class does not define serialVersionUID, the JDK will generate a random one. If you want different versions of the class to be compatible with each other during serialization and deserialization in certain scenarios, you need to define the same serialVersionUID.

The third step is to decide whether to override the writeObject()/readObject() methods according to the requirements to achieve custom serialization.

The final step is to call the writeObject()/readObject() of java.io.ObjectOutputStream to perform serialization and deserialization operations.

Since the serialization operations in Java itself are so simple, why are there still various serialization frameworks on the market? This is because these third-party serialization frameworks are faster, more efficient in serialization, and support cross-language operations.

Common Serialization Algorithms #

To help you quickly understand the serialization algorithms supported by Dubbo, we will briefly introduce some common serialization algorithms here.

Apache Avro is a language-independent serialization format. Avro relies on user-defined schemas and can quickly complete serialization without additional overhead, generating smaller serialized data. When deserializing the data, the schema used during writing needs to be obtained. Avro can be used as a serialization solution in Kafka, Hadoop, and Dubbo.

FastJson is an open-source JSON parsing library developed by Alibaba, which can parse JSON-formatted strings. It supports serializing Java objects into JSON strings, and vice versa, deserializing JSON strings into Java objects. FastJson is one of the commonly used libraries for Java programmers. As its name suggests, speed is its main selling point. From the official test results, FastJson is indeed the fastest, about 20% faster than Jackson. However, in recent years, FastJson has had more security vulnerabilities, so you need to be cautious when choosing the version.

Fst (fast-serialization) is a high-performance Java object serialization toolkit that is 100% compatible with the JDK native environment. It has a serialization speed of about 4-10 times that of native JDK serialization, and the serialized data size is about one-third of the size of native JDK serialization. Currently, Fst has been updated to version 3.x and supports JDK 14.

Kryo is an efficient Java serialization/deserialization library. It is widely used by companies such as Twitter, Yahoo, and Apache, especially in the big data field, such as Spark and Hive. Kryo provides a fast, efficient, and easy-to-use serialization API. It can be used for both database storage and network transmission to serialize Java objects. Kryo can also perform automatic deep and shallow copies and supports circular references. The characteristics of Kryo are simple API, fast serialization speed, and the serialized data is relatively small. In addition, Kryo also provides KryoNet, an NIO network communication library. If you are interested, you can search and learn about it.

Hessian2 serialization is a cross-language serialization protocol that supports dynamic typing. The binary stream of Java object serialization can be used by other languages. The data serialized by Hessian2 can be self-descriptive and does not rely on external Schema description files or interface definitions like Avro. Hessian2 can use one byte to represent common basic types, greatly reducing the size of the serialized binary stream. It should be noted that the Hessian2 serialization used in Dubbo is not the original Hessian2 serialization, but the modified Hessian Lite by Alibaba, which is the default serialization method used by Dubbo. The size of the serialized binary stream is about 50% of that of Java serialization, serialization time is about 30% of that of Java serialization, and deserialization time is about 20% of that of Java serialization. Protobuf (Google Protocol Buffers) is a set of flexible, efficient, and automated protocols developed by Google for serializing structured data. However, compared to the commonly used JSON format, Protobuf has higher conversion efficiency, with both time and space efficiency being about 5 times that of JSON. Protobuf can be used in areas such as communication protocols and data storage. It is language-independent, platform-independent, and extensible for serializing structured data formats. Currently, Protobuf provides APIs for multiple languages such as C++, Java, Python, and Go. The underlying serialization implementation of gRPC is implemented using Protobuf.

dubbo-serialization #

In order to support multiple serialization algorithms, Dubbo abstracts a Serialize layer, which is at the bottom of the entire Dubbo architecture and corresponds to the dubbo-serialization module. The structure of the dubbo-serialization module is shown in the following diagram:

Drawing 0.png

The dubbo-serialization-api module defines the core interfaces of the Dubbo serialization layer, with the most important one being the Serialization interface. It is an extension interface decorated with @SPI, with Hessian2Serialization as the default implementation. The specific implementation of the Serialization interface is as follows:

@SPI("hessian2") // Decorated with @SPI annotation, default is to use the Hessian2 serialization algorithm
public interface Serialization {
    // Each serialization algorithm corresponds to a ContentType. This method is used to obtain the ContentType 
    String getContentType();
    // It obtains the ID value of the ContentType, which is a byte value that uniquely determines an algorithm 
    byte getContentTypeId();
    // Creates an ObjectOutput object, which is responsible for implementing serialization, that is, converting Java objects into byte sequences 
    @Adaptive
    ObjectOutput serialize(URL url, OutputStream output) throws IOException;
    // Creates an ObjectInput object, which is responsible for implementing deserialization, that is, converting byte sequences into Java objects 
    @Adaptive
    ObjectInput deserialize(URL url, InputStream input) throws IOException;
}

Dubbo provides multiple implementations of the Serialization interface to support various serialization algorithms, as shown in the following diagram:

Drawing 1.png

Here, we will use the default Hessian2 serialization method as an example to introduce the implementation of the Serialization interface and other related implementations. The implementation of Hessian2Serialization is as follows:

public class Hessian2Serialization implements Serialization {

    public byte getContentTypeId() {

        return HESSIAN2_SERIALIZATION_ID; // hessian2's ContentType ID

    }

    public String getContentType() { // hessian2's ContentType

        return "x-application/hessian2";

    }

    public ObjectOutput serialize(URL url, OutputStream out) throws IOException { // create ObjectOutput object

        return new Hessian2ObjectOutput(out);

    }

    public ObjectInput deserialize(URL url, InputStream is) throws IOException { // create ObjectInput object

        return new Hessian2ObjectInput(is);

    }

}

The ObjectOutput implementation created by the serialize() method in Hessian2Serialization is Hessian2ObjectOutput, as shown in the following diagram:

Drawing 2.png

The DataOutput interface defines corresponding methods for serializing various data types in Java, as shown in the following diagram. It includes methods for serializing basic types such as boolean, short, int, long, as well as String and byte[].

Drawing 3.png

The ObjectOutput interface extends the DataOutput interface and adds the functionality of serializing objects on top of it. It is defined as shown in the following diagram. The writeThrowable(), writeEvent(), and writeAttachments() methods all call the writeObject() method to implement serialization.

Drawing 4.png

Hessian2ObjectOutput wraps a Hessian2Output object, which is a ThreadLocal object bound to the thread. In the DataOutput and ObjectOutput interfaces, the methods for serializing various data types delegate to the corresponding methods of the Hessian2Output object, as shown in the following code:

public class Hessian2ObjectOutput implements ObjectOutput {

    private static ThreadLocal<Hessian2Output> OUTPUT_TL = ThreadLocal.withInitial(() -> {

        // initialize Hessian2Output object

        Hessian2Output h2o = new Hessian2Output(null);
        h2o.setSerializerFactory(Hessian2SerializerFactory.SERIALIZER_FACTORY);

        h2o.setCloseStreamOnClose(true);

        return h2o;

    });

    private final Hessian2Output mH2o;

    public Hessian2ObjectOutput(OutputStream os) {

        mH2o = OUTPUT_TL.get(); // trigger initialization of OUTPUT_TL

        mH2o.init(os);

    }

    public void writeObject(Object obj) throws IOException {

        mH2o.writeObject(obj);

    }

    ... // other methods for serializing various types are omitted

}

The ObjectInput implementation created by the deserialize() method in Hessian2Serialization is Hessian2ObjectInput, as shown in the following diagram:

Drawing 5.png

The implementation of Hessian2ObjectInput is similar to Hessian2ObjectOutput: it implements deserialization methods for various types in the DataInput interface and provides deserialization functionality for Java objects in the ObjectInput interface. In Hessian2ObjectInput, all deserialization is delegated to Hessian2Input.

Now that you have learned about the core interfaces of the Dubbo Serialization layer and how Hessian2 serialization is integrated, you can read the code of other serialization algorithms on your own.

Summary #

In this lesson, we first introduced the basics of Java serialization, helping you quickly understand the basic concepts of serialization and deserialization. Then, we introduced common serialization algorithms such as Avro, Fastjson, Fst, Kryo, Hessian, and Protobuf. Finally, we analyzed in depth how the dubbo-serialization module integrates with each serialization algorithm, with a focus on the Hessian2 serialization approach.

If you have any questions or ideas about this lesson, feel free to leave a comment and share them with me.