13 Local Caching to Reduce Zoo Keeper Pressure a Common Technique

13 Local Caching to Reduce ZooKeeper Pressure A Common Technique #

Starting from this lesson, we enter the second part: the registry. The registry plays a crucial role in microservices architecture as it allows the service provider and consumer to be aware of each other. This can be understood from the following Dubbo architecture diagram:

Drawing 0.png

Dubbo Architecture Diagram

After the provider starts, it completes the registration process with the registry during the initialization phase.
The consumer completes the subscription process for the required provider during the initialization phase.
Additionally, when the provider changes, it needs to notify the listening consumer.

The registry only serves as a convenient way for the consumer and provider to be aware of each other’s state changes. The actual communication process between them is direct and transparent to the registry. When the provider’s state changes, the registry proactively pushes the changes to all consumers subscribed to that provider. This ensures the timeliness of consumer awareness of provider state changes and decouples it from specific business logic, thus improving system stability.

Dubbo has many concepts, and some of them can be particularly difficult to understand, such as the registry mentioned in this article. Translated into English, “注册中心” means “registry,” but it is actually the local registry client of the application. The actual “registry” service is a separately deployed process or cluster composed of processes, such as a ZooKeeper cluster. The local registry synchronizes real-time information with ZooKeeper to maintain the consistency of registry data, thus implementing the registry feature. Also, from the perspective of the registry, the consumer and provider are just user-level concepts, and they are abstracted as a URL.

Starting from this lesson, we will start analyzing the Dubbo source code. First, let’s look at the position of the content in the second part of this course in the Dubbo architecture (as shown in the red box in the figure below). It can be seen that this part is relatively independent in the entire Dubbo system and does not involve internal concepts of Dubbo such as Protocol and Invoker. After introducing these concepts, we will also review the content outside the registry red box in the figure.

Drawing 1.png

Complete Dubbo Architecture Diagram

Core Interfaces #

As the first lesson in the “registry” section, it is necessary to introduce the core abstract interfaces in the dubbo-registry-api module, as shown in the following figure:

Drawing 2.png

In Dubbo, the Node interface is generally used to abstract the concept of nodes. The Node interface can represent not only provider and consumer nodes, but also registry nodes. The Node interface defines three basic methods (as shown in the figure below):

Drawing 3.png

The getUrl() method returns the URL representing the current node.
The isAvailable() method checks whether the current node is available.
The destroy() method is responsible for destroying the current node and releasing the underlying resources.

The RegistryService interface abstracts the basic behavior of the registry service, as shown in the figure below:

Drawing 4.png

The register() and unregister() methods respectively represent registering and unregistering a URL.
The subscribe() and unsubscribe() methods respectively represent subscribing and unsubscribing to a URL. After successful subscription, when the subscribed data changes, the registry proactively notifies the NotifyListener object specified by the second parameter. The notify() method defined in the NotifyListener interface is used to receive such notifications.
The lookup() method can query registered data that meets the conditions. It has certain differences compared to the subscribe() method. The subscribe() method uses the push model, while the lookup() method uses the pull mode.

The Registry interface inherits from both the RegistryService interface and the Node interface, as shown in the figure below. It represents a node with the ability of a registry, and the reExportRegister() and reExportUnregister() methods are all delegated to the corresponding methods in the RegistryService.

Drawing 5.png RegistryFactory Interface is the factory interface for Registry, responsible for creating the Registry object. The specific definition is as follows, where the @SPI annotation specifies the default extension name as “dubbo”, and the @Adaptive annotation indicates that an adapter class will be generated and the corresponding implementation will be selected based on the value of the “protocol” parameter in the URL.

@SPI("dubbo")
public interface RegistryFactory {
    @Adaptive({"protocol"})
    Registry getRegistry(URL url);
}

From the two inheritance diagrams below, it can be seen that each Registry implementation class has a corresponding RegistryFactory implementation, and each RegistryFactory implementation is responsible for creating the corresponding Registry object.

Drawing 6.png

RegistryFactory inheritance diagram

Drawing 7.png

Registry inheritance diagram

Among them, RegistryFactoryWrapper is the Wrapper class for the RegistryFactory interface. It wraps the Registry object created by the underlying RegistryFactory with a ListenerRegistryWrapper. The ListenerRegistryWrapper maintains a collection of RegistryServiceListeners and notifies events such as register() and subscribe() to the RegistryServiceListener.

AbstractRegistryFactory is an abstract class that implements the RegistryFactory interface and provides the common ability to handle URLs and cache Registry objects. The cached Registry objects are implemented using a HashMap collection (REGISTRIES static field). In the implementation logic of handling URLs, AbstractRegistryFactory sets the class name of the RegistryService as the URL path and interface parameters, while removing the export and refer parameters.

AbstractRegistry #

AbstractRegistry implements the Registry interface. Although AbstractRegistry itself implements the read-write function of registration data in memory and does not have any abstract methods, it is still marked as an abstract class. From the inheritance diagram of Registry, it can be seen that all implementation classes of the Registry interface inherit from AbstractRegistry.

To reduce the pressure on the registry component, AbstractRegistry caches the URL information subscribed by the current node to a local Properties file. Its core fields are as follows:

registryUrl (URL type): This URL contains all the configuration information for creating the Registry object and is the product of modification by AbstractRegistryFactory.
properties (Properties type) and file (File type): Local Properties file cache. properties is the Properties object loaded into memory, and file is the corresponding file on the disk. The data of the two are synchronized. When AbstractRegistry is initialized, it immediately loads the KV cache in the file into the properties field if the file.cache parameter in registryUrl is set to enable file cache. When the registered data in properties changes, it is written to the local file for synchronization. properties is a KV structure, where the Key is a URL that the current node acts as a Consumer, and the Value is the corresponding Provider list, including URLs under all Categories (e.g., providers, routes, configurators, etc.). “registies” is a special Key-value pair in properties, where the Value is the list of registry centers, and other records are Provider lists.
syncSaveFile (boolean type): Whether to synchronize file configuration. This corresponds to the save.file parameter in registryUrl.
registryCacheExecutor (ExecutorService type): This is a single-threaded thread pool. When the registration data of a Provider changes, the full data of the Provider will be synchronized to the properties field and the cache file by this thread pool. If syncSaveFile is configured as false, the thread pool will asynchronously complete file writing.
lastCacheChanged (AtomicLong type): The version number of the registration data. When writing to the file, it is a full coverage write instead of modifying the file, so version control is needed to prevent old data from overwriting new data.
registered (Set type): This is relatively simple. It is a set of registered URLs.
subscribed (ConcurrentMap type): Represents a collection of listeners that subscribe to URLs. The Key is the URL being listened to, and the Value is the corresponding listener collection.
notified (ConcurrentMap > type): This collection has a Key that represents a URL where the current node acts as a Consumer, indicating a certain Consumer role of the node (a node can consume multiple Provider nodes); the Value is a Map collection. The Key of this Map collection is the classification (Category) of the Provider URL, such as providers, routes, configurators, etc., and the Value is the URL collection under the corresponding classification.

After introducing the core fields of AbstractRegistry, let’s take a look at what common capabilities AbstractRegistry provides depending on these fields.

1. Local Cache #

As an RPC framework, Dubbo solves the problem of collaboration between services in a microservices architecture. As a dependency of Provider and Consumer, it is packaged and deployed along with the service. dubbo-registry is also just one of the dependencies, responsible for interacting with service discovery components such as ZooKeeper, etcd, Consul, etc.

When the URL exposed by the Provider changes, the service discovery components such as ZooKeeper will notify the Registry component on the Consumer side. The Registry component will call the notify() method, and the notified Consumer will match the list of all Providers and write it to the properties collection.

Let’s take a look at the core implementation of the notify() method.

// Note the input parameters. The first URL parameter represents the Consumer, the second NotifyListener is the listener corresponding to the first parameter, and the third parameter is the full data of the URL exposed by the Provider

protected void notify(URL url, NotifyListener listener,

List<URL> urls) {

... // Omit a series of boundary condition checks

Map<String, List<URL>> result = new HashMap<>();

for (URL u : urls) {

// Need to match the Consumer URL with the Provider URL, specific matching rules will be detailed later

if (UrlUtils.isMatch(url, u)) { 

// Classify based on the category parameter in the Provider URL

String category = u.getParameter("category", "providers");

List<URL> categoryList = result.computeIfAbsent(category, 

k -> new ArrayList<>());

categoryList.add(u);

}

}

if (result.size() == 0) {

return;

}

Map<String, List<URL>> categoryNotified = 

notified.computeIfAbsent(url, u -> new ConcurrentHashMap<>());

for (Map.Entry<String, List<URL>> entry : result.entrySet()) {

String category = entry.getKey();

List<URL> categoryList = entry.getValue();

categoryNotified.put(category, categoryList); // Update notified

listener.notify(categoryList); // Call NotifyListener

// Update properties collection and underlying file cache

saveProperties(url);

}

}

saveProperties() method retrieves the URLs subscribed by the Consumer for each category, concatenates them (separated by spaces), and writes them to the properties with the Consumer’s ServiceKey as the key. The lastCacheChanged version number is incremented. After updating the properties field, it is determined whether to synchronously update the file in the current thread or submit a task to the registryCacheExecutor thread pool based on the value of the syncSaveFile field to asynchronously complete the file synchronization. The specific path of the local cache file is:

/.dubbo/dubbo-registry-[current application name]-[IP address of the current Registry].cache

First, let’s focus on the “first detail: UrlUtils.isMatch() method”. This method matches the Consumer URL and the Provider URL, and the matching parts are as follows:

Match the interface of the Consumer and Provider (prioritize the interface parameter, then the path). If the interfaces are the same or one side is “*”, the match succeeds and proceeds to the next step.
Match the category of the Consumer and Provider.
Check if the enable parameter in the Consumer and Provider URLs meets the condition.
Check if the group, version, and classifier on the Consumer and Provider ends meet the condition.

The second detail is the “URL.getServiceKey() method”. This method returns the ServiceKey used in the properties collection and the corresponding cache file. The format of ServiceKey is:

[group]/{interface(or path)}[:version]

AbstractRegistry’s core is the functionality of local file cache. In the constructor of AbstractRegistry, the loadProperties() method is called to load the local cache file written above into the properties object.

When subscription fails due to network jitter or other reasons, the Consumer’s Registry can call getCacheUrls() method to obtain the local cache and get the most recently registered Provider URLs. It can be seen that AbstractRegistry provides a fault-tolerant mechanism through local cache to ensure the reliability of services.

2. Register/Subscribe #

AbstractRegistry implements the Registry interface. Its registry() method caches the URL to be registered by the current node to the registered collection, and the unregistry() method deletes the specified URL from the registered collection when the current node goes offline, for example.

The subscribe() method records the current node as the Consumer’s URL and related NotifyListener in the subscribed collection, and the unsubscribe() method deletes the current node’s URL and associated NotifyListener from the subscribed collection.

These four methods are simple collection operations, and we will not show the specific code here.

Looking at the implementation of AbstractRegistry, these four basic registration and subscription methods are in-memory operations. However, Java supports inheritance and polymorphism, so the subclasses of AbstractRegistry will override these four basic registration and subscription methods for enhancement.

3. Recover/Destroy #

AbstractRegistry also has two other methods worth attention: the “recover() method” and the “destroy() method”.

When a Provider disconnects from the registration center due to network issues, it will reconnect and, after successful reconnection, it will call the recover() method to reprocess all the URLs in the registered collection with the register() method to recover registration data. Similarly, the recover() method will also reprocess the URLs in the subscribed collection with the subscribe() method to recover subscription listeners. The implementation of the recover() method is relatively simple, and we will not show it here. If you are interested, you can refer to the source code for learning.

When the current node goes offline, the Node.destroy() method is called to release the underlying resources. The destroy() method implemented by AbstractRegistry calls the unregister() method and the unsubscribe() method to clear all the URLs registered and subscribed by the current node, but it will not clear non-dynamically registered URLs (i.e., those with the dynamic parameter explicitly set to false). The implementation of the destroy() method in AbstractRegistry is relatively simple, and we will not show it here. If you are interested, you can also refer to the source code for learning.

Summary #

This lesson is the first lesson on analyzing Dubbo’s registration center. We first introduced the position of the registration center in the entire Dubbo architecture, as well as the functionalities of the Registry, RegistryService, RegistryFactory, and other core interfaces. Next, we detailed the common capabilities provided by the AbstractRegistry abstract class, with a focus on local caching, registration/subscription, and recovery/destruction.