01 Basic Structure What Does a Key Value Database Include

01 Basic Structure What Does a Key-Value Database Include #

We know that Redis is a typical key-value database, so today, I am going to walk you through building a simple key-value database. Why are we doing this?

Do you remember what I said at the beginning? Redis itself is quite complex, and if we start by studying specific technical aspects such as “single-threading” and “caching,” although we can learn specific content and even solve some small problems immediately, learning this way can easily get lost in the details.

From my own experience, a better way to learn is to first establish a “system view.” This means that if we want to deeply understand and optimize Redis, we must have a global understanding of its overall architecture and key modules before diving into specific technical points. This is also the teaching approach we insist on in this course.

I believe that after going through this process, we will find it much easier to locate and solve problems in practice, and you can also apply this learning approach to other learning activities. I hope you can thoroughly grasp this learning mindset to improve your learning and work efficiency.

Enough digression, let’s get back to our topic for today. Today, when constructing this simple key-value database, we only need to focus on the overall architecture and core modules. This is equivalent to dissecting a white mouse in medicine before formally dissecting a human body. By analyzing this simplest key-value database, we can quickly grasp the key points of learning and optimizing Redis.

I refer to this simple key-value database as SimpleKV. Please note that there is also a project called SimpleKV on GitHub, but it is not what I am talking about. I am simply referring to a key-value database architecture with key components.

Alright, are you ready? Let’s construct SimpleKV together.

When starting to construct SimpleKV, the first thing we need to consider is what kind of data can be stored inside, and what operations can be performed on the data. These may seem simple, but in fact, they are important foundations for understanding why Redis is frequently used in scenarios such as caching, flash sales, and distributed locks.

Once you understand the data model, you will understand why in some scenarios, data that was originally stored in a relational database can also be stored using a key-value database. For example, user information (user ID, name, age, gender, etc.) is usually stored in a relational database. In this scenario, one user ID corresponds to one collection of user information, which is a data model of a key-value database and can also fulfill this storage requirement.

However, if you only understand the data model and do not know about the operation interface, you may not understand why a key-value database is not suitable in some scenarios. For example, in the same aforementioned scenario, if you want to calculate the average age of multiple users, a key-value database will not be able to do so. This is because it only provides simple operation interfaces and cannot support complex aggregation calculations.

So, what can Redis do and what can’t it do? Only by first understanding its data model and operation interface can we truly utilize it effectively.

Next, let’s take a look at what kind of data can be stored in Redis.

What data can be stored? #

For key-value databases, the basic data model is the key-value model. For example, “hello”: “world” is a basic key-value pair, where “hello” is the key and “world” is the value. SimpleKV is no exception. In SimpleKV, the key is of type String, while the value is a basic data type such as String, integer, etc.

However, SimpleKV is ultimately a simple key-value database, and for key-value databases used in actual production environments, the value type can also be a complex type.

The differences in key types supported by different key-value databases are generally small, while there are significant differences in value types. When selecting a key-value database, an important consideration is the supported value types. For example, Memcached only supports the String value type, while Redis supports value types such as String, hash tables, lists, and sets. Redis is widely used in practical business scenarios due to its support for diverse value types.

From a usage perspective, different implementations of value types not only support different data requirements for different businesses, but also imply differences in performance and space efficiency of different data structures, resulting in differences between different value operations.

Only by thoroughly understanding the underlying principles can we choose Redis value types and optimize Redis performance with ease.

What operations can be performed on data? #

Having understood the data model, let’s now look at the basic operations it supports. SimpleKV is a simple key-value database, so the basic operations are nothing more than CRUD operations.

First, let’s understand the 3 basic operations that SimpleKV needs to support, namely PUT, GET, and DELETE.

PUT: Writes or updates a key-value pair
GET: Reads the corresponding value for a key
DELETE: Deletes the entire key-value pair based on a key

It is important to note that some key-value databases refer to the write/update operation as SET. Although new writes and updates are done using the same operation interface, the actual process will vary depending on whether the key exists or not.

In real-world scenarios, we often come across situations where we need to query a user’s access records within a certain period of time. This operation is known as SCAN in a key-value database, which means returning the corresponding value based on a range of keys. Therefore, PUT/GET/DELETE/SCAN form the basic set of operations in a key-value database.

Aside from that, real-world scenarios often have more complex requirements. For example, in a black-white list application, it may be necessary to determine if a certain user exists. If the user’s ID serves as the key, an EXISTS operation can be added to check if a key exists. For a specific key-value database, you can refer to its operation documentation to learn about the specific interfaces it provides.

Of course, when a key-value database supports various value types, corresponding operation interfaces are necessary. For example, Redis has a list type as a value, so its interface includes operations for list values. Later on, I will provide specific explanations on how different operations affect the efficiency of accessing Redis.

With that being said, our work on the data model and operation interfaces is now complete. This forms the foundation. The next step is to consider a very important design question: Should key-value pairs be stored in memory or on disks?

Storing data in memory allows for fast read and write operations, as memory access speeds are generally in the range of hundreds of nanoseconds. However, the potential risk is that all data will be lost in the event of a power failure.

Storing data on disks, while avoiding data loss, is limited by the slow read and write speeds of disks (usually in the range of several milliseconds), which can lower the overall performance of the key-value database.

Therefore, when making design choices, it is usually necessary to consider the main application scenarios for the key-value database. For example, in a caching scenario, data needs to be accessed quickly but can be lost. In this case, a key-value database designed for this scenario typically stores key-value data in memory. Memcached and Redis are examples of in-memory key-value databases. For Redis, caching is a very important application scenario. Later on, I will focus on explaining the key mechanisms, advantages, and common optimization methods of Redis as a cache.

In order to align with Redis, our SimpleKV also stores key-value data in memory. Now, let’s take a look at the basic components of SimpleKV.

In general, a key-value database consists of four main components: access framework, index module, operation module, and storage module (as shown in the diagram below). We will start building our SimpleKV based on these four components.

What access mode is adopted? #

There are usually two types of access modes: one is through the use of function libraries for external applications. For example, the libsimplekv.so shown in the figure is dynamically linked to our own program to provide key-value storage functionality. The other is through a network framework to provide key-value pair operations in the form of socket communication, which can provide a wide range of key-value storage services. In the figure above, we can see that the network framework includes a Socket Server and a protocol parser.

The protocol for interaction between different key-value database servers and clients is not the same. When we develop and add new functions to a key-value database, we must understand and master its communication protocol in order to develop compatible clients.

In practice, key-value databases basically use the two methods mentioned above. For example, RocksDB is used as a dynamically linked library, while Memcached and Redis are accessed through network frameworks. Later, I will introduce Redis’s existing clients and communication protocols to you.

By providing key-value storage services through a network framework, the use of the key-value database is expanded, but at the same time, it also provides different design choices for the performance and operation models of the key-value database, bringing some potential issues.

For example, when a client sends a command like this, the command will be encapsulated in network packets and sent to the key-value database:

PUT hello world

After the key-value database network framework receives the network packet and parses it according to the corresponding protocol, it can know that the client wants to write a key-value pair and start the actual writing process. At this point, we will encounter a system design problem. In simple terms, it is whether to process network connections, parse requests, and handle data access in one thread, multiple threads, or multiple processes. How should we design and make trade-offs? Generally, we refer to this problem as I/O model design. Different I/O models will have different impacts on the performance and scalability of the key-value database.

For example, if one thread is responsible for handling network connections, parsing requests, and completing data access, once a certain operation is blocked, the entire thread will be blocked, which reduces the responsiveness of the system. If we use different threads to handle different operations, when one thread is blocked, other threads can still run normally. However, if different threads need to access shared resources, there will be thread competition, which will also affect system efficiency. So, this is indeed a “dilemma” choice that requires careful design.

You may have often heard that Redis is single-threaded. So, how does Redis achieve “single-threaded, high-performance”? Let’s talk about that in detail later.

How to locate the position of key-value pairs? #

When SimpleKV parses the request sent by the client and determines the key-value operation to be performed, it needs to check whether the key-value pair to be operated on exists. This relies on the index module of the key-value database. The role of the index is to allow the key-value database to find the storage location of the corresponding value based on the key, and then perform the operation.

There are many types of indexes, commonly including hash tables, B+ trees, trie trees, etc. Different index structures have different characteristics in terms of performance, space consumption, and concurrent control. If you have seen other key-value databases, you will find that different key-value databases use different indexes. For example, Memcached and Redis use hash tables as key-value indexes, while RocksDB uses skip lists as the index for key-value pairs in memory.

In general, in-memory key-value databases (such as Redis) use hash tables as indexes, for a large part because their key-value data is mostly stored in memory, and the high-performance random access feature of memory matches well with the O(1) operation complexity of hash tables.

For SimpleKV, it is sufficient to find the storage location of the value based on the key. However, unlike SimpleKV, Redis supports multiple data types for values. After we find the value corresponding to a key through the index, we still need to further find the actual data we need from the complex structure (such as sets and lists) of that value. The efficiency of this operation itself depends on the implementation structure of these data types.

Redis uses some commonly used high-efficiency index structures as the underlying data structures for certain value types, which provides good support for Redis to achieve high-performance access.

What is the specific logic for different operations? #

The indexing module of SimpleKV is responsible for finding the storage location of the corresponding value based on the key. For different operations, the specific logic to be executed after finding the storage location may vary. The operation module of SimpleKV implements the specific logic for different operations:

For GET/SCAN operations, the value is returned based on the storage location.
For PUT operation, which involves writing a new key-value pair, SimpleKV needs to allocate memory space for the pair.
For DELETE operation, SimpleKV needs to delete the key-value pair and release the corresponding memory space, which is done by the allocator.

You may have noticed that, for PUT and DELETE operations, apart from writing new key-value pairs and deleting them, memory allocation and release are also required. This brings us to the storage module of SimpleKV.

How to quickly provide services after restart? #

SimpleKV uses the commonly used memory allocator, malloc and free, from glibc. Therefore, SimpleKV does not need to worry about memory space management. However, in key-value databases, the size of key-value pairs can vary, and glibc’s allocator does not perform well when allocating random-sized memory blocks. Once the size of the saved key-value data becomes too large, it may cause serious memory fragmentation problems.

Therefore, the allocator is a critical factor in a key-value database. This is especially important for Redis, which mainly stores data in memory. Redis provides multiple options for memory allocators, each with different allocation efficiency, and I will discuss this issue in detail later.

Although SimpleKV relies on memory to store data and provide fast access, I also want SimpleKV to quickly provide services after restart. Therefore, I have added persistence functionality to the storage module of SimpleKV.

However, since disk management is more complex than memory management, SimpleKV directly uses files to save key-value data by calling the operation interfaces of the local file system. At this point, SimpleKV only needs to consider when to save the key-value data in memory to the file.

One way is to save each key-value pair to disk every time. This makes SimpleKV’s data more reliable, but the performance of SimpleKV is greatly affected because it has to write to disk every time.

Another way is for SimpleKV to periodically save the key-value data in memory to the file, avoiding the performance impact of frequent disk writing operations. However, a potential cost is that SimpleKV’s data is still at risk of being lost.

Similarly to SimpleKV, Redis also provides persistence functionality. However, to adapt to different business scenarios, Redis provides various execution mechanisms, optimizations, and design considerations in its persistence mechanism, which I will introduce to you one by one later.

Summary #

So far, we have constructed a simple key-value database called SimpleKV. As we can see, the first two steps were designed from the perspective of the application, which is the application’s point of view. The last four steps actually constitute the complete internal structure of SimpleKV, which can be described as “small but complete.”

SimpleKV includes basic components of a key-value database. Once we understand these components, it will become much easier to learn Redis, which is an enhanced version of SimpleKV.

In order to support more diverse business scenarios, Redis has extended or finely optimized these components or functions, thus meeting the requirements of functionality and performance.

From this comparison chart, we can see several important changes from SimpleKV to Redis:

Redis mainly operates through a network framework, rather than being a dynamic library anymore. This also allows Redis to be accessed as a fundamental network service, expanding the scope of Redis applications.
Redis has a rich variety of value types in its data model, which brings more operational interfaces. For example, LPUSH/LPOP for lists, SADD/SREM for sets, and so on. In the next lesson, I will talk to you about the data structures and operational efficiency behind these value models, as well as their impact on Redis performance.
Redis’ persistence module supports two methods: log-based (AOF) and snapshot-based (RDB). These two persistence methods have different advantages and disadvantages, which affect the access performance and reliability of Redis.
SimpleKV is a simple single-node key-value database, but Redis supports highly reliable clusters and highly scalable clusters, so Redis includes corresponding cluster support modules.

Through the construction of SimpleKV in this lesson, I believe you have gained a holistic understanding and deep comprehension of the basic structure and important modules of a key-value database. This is actually also the core foundation of Redis in its single-node version. In the following courses, I will explain these significant evolutions of Redis one by one. At the same time, I will combine practical scenarios to ensure that you not only understand the principles but also apply them in practice, enhancing your practical capabilities.

One question per lesson #

Here’s a small question for you: Compared to Redis, what do you think is missing in terms of functional components or modules in SimpleKV?

Feel free to write down your thoughts and answer in the comments section. Let’s exchange and discuss together. You’re also welcome to share today’s content with your friends.