33 Configuration Management With Millions of Configuration Items How to Manage It

33 Configuration Management with Millions of Configuration Items - How to Manage It #

Hello, I’m Tang Yang.

I believe that when it comes to performance optimization in your work, you might think of code optimization. However, in reality, some performance optimizations can be achieved by adjusting some configuration parameters. Why do I say that? Let me give you a few examples:

You can adjust the timeout time of configurations to make requests fail quickly, prevent system snowball effects, and improve system availability.

You can also adjust the size of the HTTP client connection pool to improve the parallel processing capability of calling third-party HTTP services, thereby enhancing system performance.

You can think of configuration as a tool for managing your system. In your vertical e-commerce system, there will certainly be a large number of configuration items, such as the database address, the domain name for requesting HTTP services, and the maximum number of local memory cache items.

So, how do you manage these configuration items? What should you pay attention to during the management process?

How to manage configurations? #

Configuration management has a long history. For example, in Linux systems, there are a large number of configuration options available. Based on the actual needs of your business, you can dynamically adjust the system’s functionalities. For example, you can adjust the frequency of flushing dirty data from the Page Cache to the disk by modifying the value of the dirty_writeback_centisecs parameter. You can also adjust the length of the backlog for unestablished connections by modifying the value of the tcp_max_syn_backlog parameter. You can make these changes take effect by either modifying the configuration file and restarting the server, or by dynamically adjusting them using the sysctl command to make the changes take effect immediately.

So, what are the ways to manage configurations when developing applications? There are mainly two ways:

One way is to manage configurations through configuration files.

The other way is to use configuration centers for management.

Taking an e-commerce system as an example, when you and your team are initially developing a vertical e-commerce system, you may not pay much attention to configuration management in order to speed up product development. Configuration items are often naturally combined with code. However, as the number of configuration items grows, in order to better manage these items and avoid recompiling the code every time a configuration item is modified, you choose to separate the configuration item into a separate file (the file can be in properties format, XML format, or YAML format). However, these files are still packaged and deployed together with the project, and the difference is that after changing the configuration, the code does not need to be recompiled.

Soon, you discover a problem: although the configuration has been separated, as the configuration is still packaged together with the code, modifying a configuration item still requires repackaging, which undoubtedly increases the packaging time. So, you consider putting the configuration into a separate directory. This way, modifying the configuration does not require repackaging (however, because the configuration does not take effect in real-time, restarting the service is still required to make the configuration effective).

The basic components we usually use, such as Tomcat and Nginx, use the above configuration file method to manage configuration items. In Linux systems, the tcp_max_syn_backlog I mentioned earlier can be configured in /etc/sysctl.conf.

Here, I need to emphasize that we usually standardize the directory where the configuration file is stored. For example, configuring it as the /data/confs directory, and then using a code repository like Git to manage the configuration items. This way, when adding a new machine, you only need to create this directory in the machine initialization script and then pull the configuration from Git. It is a standardized process to avoid forgetting to deploy the configuration file when starting the application.

In addition, if your service is deployed in multiple data centers, there may be the same or different configuration items in the configurations of different data centers. In this case, you need to place the same configuration items in a directory for multiple data centers to share, and place different configuration items in a directory named after the data center. When reading the configuration, the configuration for the data center should be read first, and then the common configuration. This reduces the number of configuration items in the configuration file.

I have created a typical directory configuration for you to refer to if your system also uses files to manage configurations.

/data/confs/common/commerce // Common configurations for e-commerce business

/data/confs/commerce-zw     // Configurations for the ZW data center of the e-commerce business

/data/confs/commerce-yz     // Configurations for the YZ data center of the e-commerce business

/data/confs/common/community // Common configurations for community business

/data/confs/community-zw     // Configurations for the ZW data center of the community business

/data/confs/community-yz     // Configurations for the YZ data center of the community business

So, is this the final form of configuration management? Of course not, because there are still limitations to managing configurations in files (as I mentioned earlier). That is, we must restart the service to make the configuration effective. Is there a way to make the configuration effective without restarting the application? This requires the help of a configuration center to achieve.

How is the configuration center implemented? #

The configuration center is considered a standard component in the microservices architecture. There are many open-source solutions available in the industry for you to choose from, such as Apollo developed by Ctrip, Disconf developed by Baidu, QConf developed by Qihoo 360, and Spring Cloud Config, a component of Spring Cloud.

In my opinion, Apollo is recommended as it supports configuration for different environments and clusters, has comprehensive management features, and supports gray release and hot configuration updates. Among all the configuration centers, Apollo has the most complete set of features.

Now, let’s look at the key points of implementing a configuration center component. If you want to have more control over the configuration center and develop your own component that fits your business scenario, how do you start?

How to Store Configuration Information #

In fact, the configuration center and the registry center are very similar, as their core functions are to store and retrieve configuration items. Therefore, when designing the server-side of the configuration center, we need to choose a suitable storage component to store a large amount of configuration information. There are many components to choose from.

Different open-source configuration centers use different storage components. For example, Disconf and Apollo use MySQL, and QConf uses ZooKeeper. The configuration center I used to maintain and use before used different storage components. For example, Weibo’s configuration center used Redis to store information, while Meitu’s used Etcd.

No matter which storage component you use, what you need to do is to standardize the storage structure of the configuration items. For example, the configuration center I used before used Etcd as the storage component, supporting the storage of global configuration, regional configuration, and node configuration. Among them, the priority of node configuration is higher than that of regional configuration, and the priority of regional configuration is higher than that of global configuration. In other words, we will prioritize reading node configurations. If a node configuration does not exist, we will read the regional configuration, and then the global configuration. Their storage paths are as follows:

/confs/global/{env}/{project}/{service}/{version}/{module}/{key} // global configuration

/confs/regions/{env}/{project}/{service}/{version}/{region}/{module}/{key} // regional configuration

/confs/nodes/{env}/{project}/{service}/{version}/{region}/{node}/{module}/{key} // node configuration

How to Implement Configuration Change Notification #

After storing the configuration information, we need to consider how to push configuration changes to the server so that the configurations can be dynamically updated without restarting the server. Generally, there are two approaches to achieve configuration change notification: polling and long connection push.

Polling is simple: the application registers a listener with the configuration center client, and the client periodically (e.g., every 1 minute) queries whether the required configurations have changed. If there are any changes, the listener is triggered to notify the application.

Here’s an important point to note: If many application servers poll for configurations, the returned configuration items may be large, causing the configuration center server’s bandwidth to become a bottleneck. To solve this problem, we add an MD5 value calculated based on the configuration items for each configuration item in the configuration center.

Once a configuration item changes, its MD5 value will also change. When the configuration center client obtains the configuration, it also retrieves the corresponding MD5 value and stores it. When performing periodic polling, the stored MD5 value needs to be compared with the MD5 value in the configuration center. If they are not consistent, it means that the configuration items in the configuration center have changed, and then the latest configurations will be fetched from the configuration center.

Since the probability of changes to the configuration items stored in the configuration center is relatively low, using this method allows each polling request to only return an MD5 value, greatly reducing the configuration center server’s bandwidth usage.

The other approach, long connection push, involves the configuration center server maintaining a list of configurations that each connection is interested in. When the configuration center detects a configuration change, it can push the updated configuration to the client through the corresponding connection. This method requires maintaining long connections and the mapping between connections and configurations, making its implementation more complex than that of polling. However, compared to the polling approach, it can provide more real-time configuration change notifications.

In my opinion, the frequency of configuration changes in the configuration center is not high, so real-time performance is not a high priority. But it is expected that the implementation should be simple enough. Therefore, if you choose to develop your own configuration center, you can consider using the polling approach.

How to Ensure High Availability of the Configuration Center #

In addition to configuration change notifications, another critical point in the implementation of the configuration center is how to ensure its availability. For the configuration center, its availability is far more important than its performance. This is because we usually retrieve configurations from the configuration center when the server starts up. If the performance of configuration retrieval is low, it only leads to longer startup time, with little impact on the business. However, if we cannot retrieve the configurations, it can cause startup failures.

For example, if we store the address of the database in the configuration center, an application failure will occur if the configuration center is down, as we won’t be able to obtain the database address. Therefore, our requirement is to “bypass” the configuration center. This means that even if the configuration center or the storage it relies on fails, we can still ensure that the application can start up. How is this achieved?

Generally, we add two levels of caching on the configuration center client: in-memory cache as the first-level cache, and file cache as the second-level cache.

After the configuration center client obtains the configuration information, it synchronously writes the information to the in-memory cache and asynchronously writes it to the file cache. The purpose of the in-memory cache is to reduce the frequency of interactions between the client and the configuration center, thereby improving configuration retrieval performance. The purpose of the file cache is to serve as a backup. When the application restarts, if the configuration center fails, the application will prioritize using the configurations from the file cache. Although it won’t receive configuration change notifications (because the configuration center is down), the application can still start up, which can be seen as a kind of fallback solution.

Course Summary #

That’s all for this lesson. In this lesson, I have taken you through how we manage a large number of configuration items in the process of system development. The key points you need to understand are:

Configuration storage is hierarchical, with public configurations and personalized configurations. Generally, personalized configurations override public configurations, which reduces the number of stored configuration items.
The configuration center can provide configuration change notification and achieve hot updating of configurations.
In terms of performance metrics, availability takes priority over performance. Generally, we require the availability of the configuration center to reach 99.999%, or even 99.9999%.

You need to be aware that not all configuration items need to be stored in the configuration center. If your project still manages configurations using files, you only need to migrate configurations that need to be dynamically adjusted, such as timeouts, to the configuration center. For configurations that are unlikely to change, such as database addresses and addresses of third-party requests, you can still manage them using files. This greatly reduces the cost of configuration migration.