35 Codis vs Redis Cluster Which Clustering Solution Should I Choose

35 Codis VS Redis Cluster Which Clustering Solution Should I Choose #

Redis’s sharding cluster uses multiple instances to store data, which is suitable for scenarios with a large amount of data. In [Lesson 8], we learned about Redis Cluster, the official sharding cluster solution provided by Redis, which laid the foundation for mastering sharding cluster. Today, I will take you to the next level and introduce Codis, widely used in the industry before the official release of Redis Cluster.

I will explain the key technical implementation principles of Codis and compare it with Redis Cluster to help you choose the best clustering solution.

Without further ado, let’s first learn about the overall architecture and process of Codis.

Overall Architecture and Basic Process of Codis #

A Codis cluster consists of four key components.

Codis server: This is a Redis instance that has been modified to support additional data structures and data migration operations. It is responsible for handling data read and write requests.
Codis proxy: This component receives requests from clients and forwards them to the Codis server.
Zookeeper cluster: It stores cluster metadata, such as data location information and Codis proxy information.
Codis dashboard and Codis FE: Together, they form the cluster management tool. The Codis dashboard is responsible for cluster management tasks, including adding or removing Codis servers and proxies, and performing data migration. Codis FE provides a web-based interface for managing the cluster.

Let me show you an architecture diagram of the Codis cluster and its key components.

Codis Architecture

Now, let me explain how Codis handles requests.

First, to enable the cluster to receive and process requests, we need to use the Codis dashboard to set the access addresses for Codis servers and proxies. Once this setup is complete, Codis servers and proxies start accepting connections.

When a client wants to read or write data, it directly establishes a connection with the Codis proxy. You may wonder if any modifications are needed on the client-side to access the proxy. However, you don’t need to worry. Codis proxy supports the Redis RESP protocol, so when a client accesses the Codis proxy, it is no different from accessing a native Redis instance. This way, clients that were originally connecting to a single Redis instance can easily establish connections with the Codis cluster.

Finally, when the Codis proxy receives a request, it queries the mapping between the request data and the Codis server, and forwards the request to the corresponding Codis server for processing. Once the Codis server completes the request, it returns the result to the Codis proxy, which then returns the data to the client.

Let me show you a flowchart of this process:

Request Processing

Now that you have an understanding of the Codis cluster’s architecture and basic process, I will discuss the specific design choices and principles of the four technical factors that affect the effectiveness of sharded clusters: data distribution, cluster scaling and data migration, client compatibility, and reliability assurance. This will help you grasp the specific usage of Codis.

Key Technology Principles of Codis #

Once we use a sharding cluster, the first problem we face is how the data is distributed among multiple instances.

How is data distributed in the cluster? #

In the Codis cluster, the mapping of where a piece of data should be saved is achieved through logical slots. Specifically, the process consists of two steps.

In the first step, the Codis cluster has a total of 1024 slots numbered from 0 to 1023. We can manually assign these slots to Codis servers, with each server containing a portion of the slots. Alternatively, we can let the Codis dashboard automatically assign the 1024 slots evenly across all servers.

In the second step, when a client wants to read or write data, it uses the CRC32 algorithm to calculate the hash value of the data key, and then takes the modulo of this hash value by 1024. The resulting value corresponds to the slot number. Based on the slot-server assignment from the first step, we can determine which server the data is saved on.

Let me give you an example. The following diagram shows the mapping of data, slots, and Codis servers. Slot 0 and 1 are assigned to server1, Slot 2 is assigned to server2, and Slot 1022 and 1023 are assigned to server8. When a client accesses key 1 and key 2, the modulo of their CRC32 values by 1024 is 1 and 1022, respectively. Therefore, they will be saved on Slot 1 and Slot 1022, which have been assigned to Codis server 1 and 8. This way, the storage locations of key 1 and key 2 become clear.

Data-Slot-Server Mapping

The mapping relationship between data keys and slots is calculated directly by the client before reading or writing data using CRC32. The mapping relationship between slots and Codis servers, on the other hand, is determined by the assignment process and needs to be stored in a storage system; otherwise, if there is a cluster failure, the mapping relationship will be lost.

We refer to the mapping relationship between slots and Codis servers as the data routing table (or simply routing table). Once we have assigned and modified the routing table on the Codis dashboard, the dashboard will send the routing table to the Codis proxy, and the dashboard will also save the routing table in ZooKeeper. The Codis proxy caches the routing table locally, so when it receives a client request, it can directly query the local routing table and complete the correct request forwarding.

You can take a look at this diagram, which illustrates the allocation and usage process of the routing table.

Routing Table Process

In terms of the implementation of data distribution, Codis and Redis Cluster are quite similar, both adopting the mechanism of mapping keys to slots and further distributing slots across instances.

However, there is one noticeable difference, which I will explain.

In Codis, the routing table is assigned and modified by us on the Codis dashboard and saved in a ZooKeeper cluster. Once the data location changes (such as instances being added or removed), the routing table is modified, and the Codis dashboard will send the modified routing table to the Codis proxy. The proxy can then forward requests based on the latest routing information.

In Redis Cluster, the data routing table is communicated among instances and eventually saved on each instance. When there is a change in the data routing information, it needs to be propagated among all instances through network messages. Therefore, if there are many instances in the cluster, it will consume a considerable amount of network resources.

Data distribution solves the problem of which server to store new data when writing, but when the business data increases, if the existing instances in the cluster are not sufficient to store all the data, we need to scale the cluster. Next, let’s learn about the key technical design of Codis for cluster scaling.

How is cluster scaling and data migration performed? #

Codis cluster scaling involves two aspects: adding Codis servers and adding Codis proxies.

Let’s start by looking at adding Codis servers, which mainly involves two steps:

Start a new Codis server and add it to the cluster.
Migrate a portion of the data to the new server.

It is important to note that data migration is a crucial mechanism here, which I will focus on explaining.

Codis performs data migration at the granularity of slots. Let’s take a look at the basic process of migration.

On the source server, Codis randomly selects a piece of data from the slots to be migrated and sends it to the destination server.
After confirming the receipt of the data, the destination server sends an acknowledgment message to the source server. At this point, the source server deletes the migrated data locally.
Steps one and two constitute the migration process for a single piece of data. Codis continuously repeats this migration process until all the data in the slots to be migrated has been migrated.

I have created the following diagram to illustrate the process of data migration for better understanding.

Data Migration Process

Regarding the migration process for a single piece of data I just described, Codis implements two migration modes: synchronous migration and asynchronous migration. Let’s take a closer look.

Synchronous migration refers to the situation where the source server is blocked and unable to process new request operations while it is sending the data to the destination server. This mode is easy to implement but involves multiple operations during the migration process (including data serialization on the source server, network transmission, deserialization on the destination server, and deletion on the source server). If the data being migrated is a big key, the source server will be blocked for a long time and unable to process user requests in a timely manner.

To avoid blocking the source server during data migration, Codis implements the second migration mode, which is asynchronous migration. Asynchronous migration has two key characteristics. The first feature is that when the source server sends data to the destination server, it can handle other requests and operations without waiting for the execution of the destination server’s command. After receiving and deserializing the data, the destination server sends an ACK message to the source server, indicating that the migration is complete. At this point, the source server deletes the migrated data locally.

During this process, the migrated data is set as read-only, so the data on the source server will not be modified, and there will be no issue of “data inconsistency with the destination server”.

The second feature is that for bigkey, asynchronous migration uses a split command approach. Specifically, for each element in the bigkey, a command is used to migrate it instead of serializing and transmitting the entire bigkey. This way, the problem of blocking the source server due to serialization of a large amount of data during bigkey migration is avoided.

In addition, if a failure occurs in Codis while migrating a part of the data, some elements of the bigkey will remain on the source server while others will be on the destination server, which breaks the atomicity of the migration.

Therefore, Codis sets a temporary expiration time for the elements of the bigkey on the destination server. If a failure occurs during the migration, the keys on the destination server will be deleted after expiration, without affecting the atomicity of the migration. When the migration is completed normally, the temporary expiration time for the elements of the bigkey is removed.

Let me give you an example. Suppose we want to migrate a List type data with 10,000 elements. When using asynchronous migration, the source server will transmit 10,000 RPUSH commands to the destination server, with each command corresponding to the insertion of one element in the List. On the destination server, these 10,000 commands are executed in sequence to complete the data migration.

Here, there is something you need to pay attention to. To improve the efficiency of migration, Codis allows multiple keys to be migrated at once when migrating Slots asynchronously. You can set the number of keys migrated at each time by using the parameter numkeys in the asynchronous migration command SLOTSMGRTTAGSLOT-ASYNC.

So far, we have learned about the expansion and data migration mechanisms of Codis server. In fact, in a Codis cluster, in addition to adding Codis servers, there is sometimes a need to add Codis proxies.

Because in the Codis cluster, clients directly connect to Codis proxies, so when the number of clients increases, one proxy cannot support a large number of request operations. In this case, we need to add a proxy.

Adding a proxy is relatively easy. We start the proxy directly and then add the proxy to the cluster through the Codis dashboard.

At this point, the access connection information of the Codis proxy will be stored in ZooKeeper. So when a proxy is added, the ZooKeeper will have the latest access list, and clients can read the proxy access list from ZooKeeper and send requests to the newly added proxy. This way, the client’s access pressure can be shared and processed among multiple proxies, as shown in the following figure:

Okay, so far, we have learned about the data distribution, cluster expansion, and data migration methods in the Codis cluster. These are all key mechanisms in a sharded cluster.

However, because the functionality provided by a cluster is different from that of a single instance, when using a cluster, we not only need to focus on the key mechanisms in the sharded cluster but also need to consider the usage of the client. This raises a question: Can the client used by the business application directly interact with the cluster? Next, let’s discuss this issue.

Do cluster clients need to be redeveloped? #

When using Redis single instances, as long as the client conforms to the RESP protocol, it can interact with the instance and read/write data. However, when using a sharded cluster, some functionalities are different from single instances. For example, data migration operations in the cluster do not exist in single instances, and during the migration process, data access requests may need to be redirected (such as the MOVE command in Redis Cluster). So, the client needs to add support for commands related to cluster functionality. If the original single-instance client wants to scale and use clusters, it needs to use the new client, which is not particularly friendly in terms of compatibility with business applications.

When designing the Codis cluster, full consideration was given to the compatibility with existing single-instance clients.

Codis uses Codis Proxy to directly connect with the client. Codis Proxy is compatible with single-instance clients. The management work related to clusters (such as request forwarding and data migration) is handled by components such as Codis Proxy and Codis Dashboard, without the need for client participation.

In this way, when a business application uses the Codis cluster, the client does not need to be modified. It can reuse the client that connects to the single instance, which allows for the utilization of the cluster’s ability to read and write large amounts of data while avoiding the need to modify the client to add complex operation logic, ensuring the stability and compatibility of the business code.

Lastly, let’s take a look at the issue of cluster reliability. Reliability is a core requirement for actual business applications. For a distributed system, its reliability is related to the number of components in the system: the more components, the more potential points of failure. Unlike Redis Cluster, which only includes Redis instances, the Codis cluster includes 4 types of components. You may ask, does having so many components reduce the reliability of the Codis cluster?

How to ensure cluster reliability? #

Let’s separately consider the reliability guarantee methods for different components of Codis.

First, let’s look at the codis server.

The codis server is actually a Redis instance, with additional commands related to cluster operations. Redis’s master-slave replication mechanism and sentinel mechanism can be used on the codis server. Therefore, Codis uses a master-slave cluster to ensure the reliability of codis servers. In simple terms, Codis configures a slave for each server and monitors them using the sentinel mechanism. When a failure occurs, the master and slave can be switched, ensuring the reliability of the server.

In this configuration, each server becomes a server group, where each group has one master and multiple slaves. The distribution of data uses Slots, which are allocated based on the granularity of groups. At the same time, when codis proxy forwards requests, it sends write requests to the master of the corresponding group based on the Slot and group relationship, and sends read requests to the master or slave in the group.

The following diagram shows the Codis cluster architecture with server groups. In the Codis cluster, server groups and sentinel clusters are deployed to achieve master-slave switchovers of codis servers and improve cluster reliability.

Because codis proxy and ZooKeeper are used together, let’s now look at the reliability of these two components.

In the design of the Codis cluster, the information on the proxy comes from ZooKeeper (such as the routing table). The ZooKeeper cluster uses multiple instances to store data. As long as more than half of the ZooKeeper instances can function properly, the ZooKeeper cluster can provide services and ensure the reliability of this data.

Therefore, codis proxy uses a ZooKeeper cluster to store the routing table, fully leveraging the high reliability guarantee provided by ZooKeeper to ensure the reliability of codis proxy without the need for additional work. When codis proxy fails, it can be directly restarted. After restarting, the proxy can obtain the routing table from the ZooKeeper cluster through codis dashboard. It can then accept client requests and forward them. This design also reduces the development complexity of the Codis cluster itself.

For codis dashboard and codis fe, they mainly provide configuration management and manual operation by administrators. They have low load pressure, so their reliability does not need to be guaranteed separately.

Slicing Cluster Solution Selection Suggestions #

So far, we have learned about two slicing cluster solutions: Codis and Redis Cluster. I have summarized their differences in a table for you to compare.

Finally, when it comes to practical application, how should we choose between these two solutions? Here are four suggestions.

In terms of stability and maturity, Codis has been applied earlier and has mature production deployments in the industry. Although Codis introduces proxy and ZooKeeper, increasing the complexity of the cluster, the stateless design of the proxy and the stability of ZooKeeper also provide guarantees for Codis’s stable usage. Redis Cluster was released later than Codis, and its maturity is relatively weaker compared to Codis. If you want to choose a mature and stable solution, Codis is more suitable.
In terms of compatibility with client applications, clients connecting to a single instance can directly connect to Codis proxy, while clients that originally connected to a single instance would need new functionalities to connect to Redis Cluster. Therefore, if your business application heavily uses clients connecting to a single instance and you now want to apply slicing clusters, I suggest you choose Codis to avoid modifying clients in your business application.
In terms of using new Redis commands and features, Codis server is developed based on open-source Redis 3.2.8. Therefore, Codis does not support new commands and data types introduced in subsequent open-source versions of Redis. In addition, Codis does not implement all commands in the open-source Redis version, such as BITOP, BLPOP, BRPOP, as well as transaction-related commands like MULTI and EXEC. The Codis official website lists the unsupported command list, so remember to check it when using Codis. Therefore, if you want to use new features of the open-source Redis version, Redis Cluster is a suitable choice.
In terms of data migration performance, Codis supports asynchronous migration, which has less impact on the performance of handling normal requests in the cluster compared to synchronous migration. Therefore, if data migration is frequent when you apply the cluster, Codis is a more suitable choice.

Summary #

In this lesson, we learned about the Codis solution for Redis sharding clusters. The Codis cluster consists of four main components: codis server, codis proxy, Zookeeper, codis dashboard, and codis fe. Let’s review their main functions.

Codis proxy and codis server are responsible for handling data read and write requests. Codis proxy connects with the client, receives requests, and forwards them to codis server, which handles the requests.
Codis dashboard and codis fe are responsible for cluster management. Codis dashboard executes management operations, while codis fe provides a web management interface.
The Zookeeper cluster is responsible for storing all metadata information of the cluster, including routing tables and proxy instance information. It’s worth noting that besides Zookeeper, Codis can also use etcd or the local file system to store metadata information.

Regarding the selection considerations between Codis and Redis Cluster, I provided you with advice from four aspects: stability and maturity, client compatibility, use of Redis new features, and data migration performance. I hope this helps you.

Finally, I have another suggestion for using Codis: when you have multiple business lines using Codis, you can start multiple codis dashboards, with each dashboard managing a portion of codis servers. At the same time, use another dashboard to manage the cluster for a specific business line. This way, you can achieve isolation management for multiple business lines using a single Codis cluster.

One Question per Lesson #

As usual, I will ask you a small question. Suppose that 80% of the key-value pairs stored in the Codis cluster are of type Hash. Each Hash set contains between 100,000 and 200,000 elements, and each element in the set is 2KB in size. Do you think migrating such a Hash set would have an impact on Codis’ performance?

Feel free to share your thoughts and answers in the comments section. Let’s discuss and exchange ideas together. If you find today’s content helpful, please share it with your friends or colleagues. See you in the next lesson.