01 Guide You Through Redis Source Code Structure Quickly

01 Guide You Through Redis Source Code Structure Quickly #

Starting from today’s lesson, we will embark on a “Redis Code Journey” together to grasp the core design principles of Redis.

However, before we officially begin our journey, we need to first create a “guidebook” to understand and master the overall architecture of Redis code.

This is because once we understand the overall architecture of Redis code, it’s like having a panoramic view of Redis code. With this view, we can quickly find and locate the code files corresponding to different functional modules of Redis when we learn about the design and implementation of these modules. Moreover, with the picture of the code in mind, we can also have a comprehensive understanding of various functional features of Redis, making it easier to fully grasp the functionality of Redis without missing any features.

So, how should we learn the code architecture of Redis? My suggestion is to master the following two aspects:

  • Code directory structure and functional categories, with the aim of understanding the overall architecture of Redis code and the categories of code functions it contains;
  • System functional modules and corresponding code files, with the aim of understanding the various functionalities provided by Redis instances and their corresponding implementation files for further in-depth learning.

In fact, once you have mastered the above two aspects, even if you want to understand and learn the code architecture of other software systems, you can use the “surface-to-point” approach. That is, first understand the directory structure and functional categories, and then the corresponding functional modules and implementation files. This can help you quickly grasp the code panorama of a software system.

Therefore, in the subsequent learning process, you should carefully follow my lead and preferably have a computer at hand that allows you to easily view the source code. Whenever I mention source code files, key modules, or code execution, make sure to read or practice them to establish a deeper understanding of the code architecture.

Alright, enough talk. Let’s move on and complete the guidebook for the Redis code journey.

Redis Directory Structure #

First, let’s take a look at the directory structure of Redis.

Why start with the directory structure? Actually, this is a small tip I use when reading code: when studying the code of a large system software, in order to quickly gain a preliminary understanding of the code, it is effective to understand the overall directory structure of the system source code. This is because system developers usually organize code files that complete the same or similar functions according to the directory structure. Code files that can be grouped into the same directory usually have similar functional goals.

Therefore, starting to learn from the directory structure of the code can allow us to directly understand the main components of a system from the directory naming and directory hierarchy.

So for Redis, under its main source code directory, there are four subdirectories: deps, src, tests, and utils. These four subdirectories correspond to different code parts that play different roles in Redis. Let’s take a closer look below.

deps Directory #

This directory mainly contains the third-party code libraries that Redis depends on, including Redis’ C language client code hiredis, jemalloc memory allocator code, readline alternative code linenoise, and lua script code.

A notable feature of this part of the code is that they can be compiled independently of the functional source code in the Redis src directory, which means that they can exist and develop independently of Redis. The following image shows the contents of the subdirectories under the deps directory.

So why is there a third-party code library directory in the Redis source code structure? In fact, there are two main reasons.

On one hand, Redis is a user-space program written in C language, and many of its functions rely on the standard glibc library, such as memory allocation, line reading and writing (readline), file reading and writing, and creating child processes/threads. However, some of the implementations of the functions provided by the glibc library are not very efficient.

Let me give a simple example. The performance of the memory allocator implemented in the glibc library is not very high, and the fragmentation of memory is also quite severe. Therefore, in order to avoid impact on system performance, Redis uses the jemalloc library to replace the memory allocator of the glibc library. However, the jemalloc library itself does not belong to the functionality of the Redis system itself. It is not appropriate to put it in the same directory as the Redis functional source code. Therefore, Redis uses the dedicated deps directory to store this part of the code.

On the other hand, some functions are required for the operation of Redis, but these functions can be developed and evolved independently of Redis. The most typical type of functional code for this type is Redis’ client code.

Redis, as a client-server architecture system, cannot access Redis without the support of a client. In addition, the functionality of Redis itself, such as the command-line redis-cli, the benchmarking program redis-benchmark, and the Sentinel, all need to use the client to access Redis instances.

However, as you may know, for client development, as long as the process of interaction between the client and the instance meets the RESP protocol, the functionality of the client and the instance can be independently developed and evolved. Therefore, in the Redis source code structure, the C language version of the client hiredis is placed in the deps directory for developers to develop and improve the client functionality.

In summary, for the deps directory, you only need to remember that it mainly stores three types of code: first, the more efficient feature libraries that Redis depends on, such as memory allocation; second, code that is developed and evolved independently of Redis, such as clients; and third, lua script code. When you learn about the design and implementation of these features later, you can find them in the deps directory.

src Directory #

This directory contains all the code files of Redis’ functional modules and is also an important part of the Redis source code. Similarly, let’s take a look at the subdirectory structure under the src directory.

We can see that there is only one subdirectory called “modules” under the “src” directory, which contains an example code for implementing Redis modules. The remaining source code files are located directly under the “src” directory without further subdirectories.

In the implementation of Redis’s functional modules, the typical C programming style is used, where different modules are not separated by directories but instead rely on header file inclusion to call each other. This coding style is also common in system software developed using the C programming language, such as the Memcached source code, which is also located in the same level directory.

Therefore, when developing software systems using the C programming language, you can refer to the source code structure of Redis and organize all source code files in a flat directory. This way, referencing between modules will be convenient.

The “tests” Directory #

During the development process of software products, in addition to third-party dependency libraries and source code of functional modules, we usually need to add code for functional module testing and unit testing in the system source code. In the Redis code directory, this part of the code is organized and managed under a “tests” directory.

The Redis implementation of test code can be divided into four parts: unit testing (corresponds to the “unit” subdirectory), Redis Cluster functional testing (corresponds to the “cluster” subdirectory), sentinel functional testing (corresponds to the “sentinel” subdirectory), and master-slave replication functional testing (corresponds to the “integration” subdirectory). The test code in these subdirectories is written in Tcl language, a commonly used scripting language, to facilitate testing.

In addition, each part of the tests is a collection of tests that cover multiple sub-function tests in the respective functional modules. For example, in the unit testing directory, we can see tests for expired keys (“expire.tcl”), lazy deletion (“lazyfree.tcl”), and operations on different data types (in the “type” subdirectory), among others. In the Redis Cluster functional testing directory, we can see tests for failover (“failover.tcl”) and replica migration (“replica-migration.tcl”), among others.

However, in the “tests” directory, in addition to test code for specific functional modules, there are also some code files that are used to support testing functionality. These code files are located in the “assets”, “helpers”, “modules”, and “support” subdirectories. I have created this diagram to show the code structure and hierarchy under the “tests” directory, which you can refer to.

The “utils” Directory #

In the Redis development process, there are also some auxiliary functionalities, including scripts for creating Redis Cluster, programs for testing the effectiveness of the LRU algorithm, and programs for visualizing the rehash process. In the Redis code structure, these functionalities are classified and managed under the “utils” directory. The main subdirectories under the “utils” directory are shown in the diagram below.

Therefore, when developing a system, you can learn from the Redis code structure and classify the auxiliary functionalities related to the system under the “utils” directory for unified management.

Okay, in addition to the “deps”, “src”, “tests”, and “utils” subdirectories, the Redis source code root directory also contains two important configuration files: the configuration file for Redis instances, “redis.conf”, and the configuration file for sentinels, “sentinel.conf”. When you need to locate or modify the configuration for a Redis instance or sentinel, you can directly locate these files in the source code root directory.

Finally, you can review the overall structure and hierarchy of the Redis source code, as shown in the following diagram.

Now that we have understood the directory structure of the Redis code, the next step is to focus on learning the source code files of the functional modules (i.e., the contents of the files under the “src” directory). This will help us quickly find the corresponding source code files when learning about Redis’s design concepts in the subsequent courses.

Correspondence between Redis Modules and Source Code #

The src directory in the Redis code structure contains 123 code files that implement various modules. For each module, there is typically a C language file (.c file) and a corresponding header file (.h file) to implement the functionality. For example, dict.c and dict.h are the C file and header file for implementing the hash table.

(Note: In this course, unless otherwise specified, the source code I refer to is based on Redis version 5.0.8.)

So how do we match these 123 files with the main functionalities of Redis?

The Redis code files are named in a very standardized way, and the filenames reflect the main functionalities implemented in the files. For example, from the filenames rdb.h and rdb.c, you can tell that they are the corresponding code for implementing the memory snapshot RDB.

Therefore, for the purpose of quickly locating the source code, I have categorized the Redis functionality source code into four code paths: server instance, database operations, reliability and scalability guarantees, and auxiliary functions. Depending on the dimension of functionality you are interested in, you can study the corresponding code.

Server Instance #

First of all, we know that Redis is a network server instance at runtime, so it needs to have code to initialize the server instance and control the main flow. This is handled by server.h/server.c, and the main entry function of the Redis code is also in server.c. If you want to understand how Redis starts running, you can start from the main function in server.c.

Of course, as a network server, Redis also needs to provide network communication functionality. Redis uses an event-driven network communication framework. The code files involved include ae.h/ae.c, ae_epoll.c, ae_evport.c, ae_kqueue.c, and ae_select.c. I will provide a detailed introduction to the design and implementation of the event-driven framework in Lesson 10.

Apart from the event-driven network framework, the functionalities related to network communication also include low-level TCP network communication and client implementation.

Redis encapsulates the socket connection, configuration, and other operations related to TCP network communication in anet.h/anet.c. These encapsulated functions are called during the Redis Cluster creation and master-slave replication processes to establish TCP connections.

In addition, clients are widely used during the operation of Redis, such as returning the read data, transmitting data between master and slave libraries during replication, and communication between shard instances in Redis Cluster. The creation and message reply functions of clients are implemented in networking.c. If you want to understand the design and implementation of clients, you can focus on this code file.

Here is a summary of the functionality modules related to the server instance and their corresponding code files:

After understanding the main functionality code related to the Redis server instance, let’s take a look at the code files related to the Redis in-memory database from this feature perspective.

Database Data Types and Operations #

Redis provides a rich set of key-value data types, including String, List, Hash, Set, and Sorted Set. In addition, Redis also supports extended data types such as bitmaps, HyperLogLog, and Geo.

To support these data types, Redis uses various data structures as the underlying structure for these types. For example, the underlying data structure for the String type is SDS, and the underlying data structure for the Hash type includes hash tables and compressed lists.

However, because Redis implements a large number of underlying data structures, I have listed these structures, their corresponding key-value types, and the relevant code files in the table below for quick reference.

In addition to implementing various data types, Redis as a database also provides interfaces for adding, querying, modifying, and deleting key-value pairs. This functionality is implemented in the db.c file.

As a memory database, Redis is limited by the size of memory for storing data. Therefore, efficient use of memory is crucial for Redis. So you may be wondering: How does Redis optimize memory usage?

In fact, Redis optimizes memory usage from three aspects: memory allocation, memory reclamation, and data eviction.

Firstly, in terms of memory allocation, Redis supports different memory allocators, including the default allocator provided by the glibc library, tcmalloc, and jemalloc provided by third-party libraries. The encapsulation of memory allocators is implemented in zmalloc.h/zmalloc.c.

Secondly, in memory reclamation, Redis supports setting expiration for keys and different eviction policies for expired keys. This part of the code is implemented in the expire.c file. In order to avoid the impact on system performance caused by deleting a large number of keys and reclaiming memory, Redis implements asynchronous deletion in lazyfree.c. This allows us to use a background IO thread to handle deletion, avoiding the impact on the Redis main thread.

Lastly, for data eviction, if the memory is full, Redis will remove unnecessary data according to certain rules. This is why Redis can be used as a cache. Redis implements various data eviction policies, including LRU, LFU, and other classic algorithms. This part of the code is implemented in evict.c.

Similarly, I have summarized the functional modules and code files related to Redis database data types and operations into a diagram which you can take a look at.

High Reliability and Scalability #

Firstly, although Redis is typically used as an in-memory database, it also provides reliability guarantees. This is mainly reflected in the fact that Redis can persistently store data and it implements a master-replica replication mechanism to provide fault tolerance.

This part of the code is more concentrated and mainly includes the following two parts.

  • Data persistence implementation

Redis has two ways of data persistence: RDB for in-memory snapshots and AOF log. They are respectively implemented in rdb.h/rdb.c and aof.c.

Note that when using RDB or AOF for database recovery, the RDB and AOF files may not be completely saved due to a crash of the Redis instance’s server, which may affect database recovery. In response to this issue, Redis also implements checking functions for these two types of files, and the corresponding code files are redis-check-rdb.c and redis-check-aof.c.

  • Master-Replica Replication Implementation

Redis implements the master-replica replication feature in the file replication.c. Additionally, you should also know that Redis relies on the Sentinel mechanism for recovery in its master-replica clusters, and this functionality is implemented in the sentinel.c file.

Secondly, similar to the functionality that ensures high reliability in Redis, the functionality that ensures high scalability in Redis is implemented through Redis Cluster, and this part of the code is also centralized in the cluster.h/cluster.c code files. This makes it very convenient to study the design and implementation of Redis Cluster without having to jump back and forth between different files.

Auxiliary Features #

Redis also implements some auxiliary features to support system operation and maintenance. For example, to facilitate operations personnel in viewing and analyzing the sources of delay for different operations, Redis implements latency monitoring in latency.h/latency.c; to facilitate operations personnel in searching for slow-running commands, Redis implements slow command logging in slowlog.h/slowlog.c, and so on.

In addition, operations personnel sometimes need to understand the performance of Redis. To support this goal, Redis implements performance benchmarking of the system, and this part of the code is in redis-benchmark.c. If you want to learn how to perform performance testing on Redis, this code file is worth a read.

Summary #

Today we had a “warm-up lesson” on understanding the architecture and design principles of Redis source code. First and foremost, it is important to understand the code structure, as it provides us with a panoramic view of Redis functional modules and facilitates quick searching and locating of specific module implementation source code. This also helps improve code readability and efficiency in reading code.

At the beginning, I introduced a tip: understanding the code structure of a system software by analyzing directory naming and hierarchies. By studying the directory structure of Redis, we also learned an important programming guideline: using different directories to organize code when developing system software.

Common directories include the deps directory for third-party libraries, the tests directory for test cases, and the commonly used utils directory for auxiliary functionality and tools. By organizing your code according to this guideline, you can improve code readability and maintainability.

Furthermore, when studying the code structure of Redis functional modules, faced with 123 code files, I shared with you a method that I have always advocated: classification. This means categorizing or summarizing the content you are studying according to certain dimensions.

In this course, I organized four code paths based on server instances, database data types and operations, high reliability and scalability guarantee, and auxiliary functionality. These four code paths essentially cover the main functional code of Redis, making it easier for you to learn and master Redis source code in a logical and systematic way, without missing important code.

Finally, I would like to emphasize that while you are studying the Redis source code structure, I also hope that you can apply this method to studying other code, thereby improving learning efficiency.

One question per class #

Since version 4.0, Redis supports background asynchronous execution of tasks, such as asynchronous deletion of data. Can you find the code file that implements background tasks in the Redis source code?

Please feel free to share your thoughts and processes in the comments section. Let’s discuss and exchange ideas together. If you find this helpful, you are also welcome to share today’s content with more friends.