00 Preface Redis Learning Tips Can Elevate Your Skills

00 Preface Redis Learning Tips Can Elevate Your Skills #

Hello, I’m Jiang Dejun. Welcome to learn Redis with me.

After completing my PhD, I have been working at the Institute of Computing Technology, Chinese Academy of Sciences. My current position is Associate Researcher. Over the past 14 years, I have been engaged in research on underlying infrastructure of the Internet, focusing on new storage media, key-value databases, storage systems, and operating systems.

In 2015, my team and I were tasked with a highly challenging mission: to design a key-value database with single-machine performance reaching millions of transactions per second. To achieve this goal, we started to focus on studying Redis, and ever since then, I have been connected to this database.

As a key-value database, Redis is widely used. If you are a backend engineer, I guess you will often be asked about performance issues related to Redis during job interviews. For example, in order to ensure data reliability, Redis needs to write AOF (Append-Only File) and RDB (Redis Database file) to the disk. However, in high-concurrency scenarios, this will directly bring two new problems: one is that writing AOF and RDB will cause Redis performance fluctuations, and the other is that reading RDB is slow during Redis cluster data synchronization and instance recovery, which limits the synchronization and recovery speed.

So, is there a good solution to this problem? Haha, here’s a little teaser. Actually, a feasible solution is to use non-volatile memory (NVM) because it can guarantee high-speed read/write operations and fast data persistence. My team and I have conducted in-depth research on NVM key-value databases, applied for more than twenty patents, and published academic papers at top academic conferences.

Of course, all this research was ultimately done to achieve the initial goal: to design a key-value database with single-machine performance reaching millions of transactions per second. In this process, I also studied Redis in depth, including its source code, architectural design, and core control points.

In addition, because major internet companies are more advanced in the application of Redis and have diverse scenarios, they encounter various tricky problems. Therefore, in recent years, I have collaborated with companies such as Ant Financial, Baidu, Huawei, and ZTE, dedicated to solving various difficult issues in production environments together with them.

Finally, benchmarking Redis, our team also developed high-performance key-value database HiKV and so on. If you are interested, you can click on the link to see the overall design.

It is because of such research and project experience that I have witnessed different “ways of playing” with Redis in different companies, even though they all use Redis. For example, some use it as cache, some as a database, and some as a distributed lock. However, the “pitfalls” they encounter mostly focus on four areas:

  • “Pitfalls” related to CPU usage, such as complexity of data structures and accessing across CPU cores;
  • “Pitfalls” related to memory usage, such as memory contention in master-slave synchronization and AOF;
  • “Pitfalls” related to storage persistence, such as performance fluctuations when taking snapshots on SSD;
  • “Pitfalls” related to network communication, such as exceptional network packet loss in multi-instance scenarios.

With these in-depth research, practical operations, and accumulated case studies, I have summarized a set of Redis knowledge from principles to practicality. This time, I want to share my years of experience with you.

Why can’t I use Redis well even though I understand each technical point? #

I know that many students are learning this course with specific questions in mind, such as how to persist Redis data or how to implement a clustering solution. These questions are definitely important, but if you are only focused on solving these specific problems, your Redis skills will have a hard time improving qualitatively.

Over the years, in my cooperation with domestic tech giants, I have found that many technical professionals have a misconception: they only focus on scattered technical points without establishing a complete knowledge framework. They lack a systematic view, but a systematic view is actually crucial. To some extent, having a systematic view means that you can locate and solve problems with basis and methodology.

Speaking of this, I would like to share a small case with you.

Nowadays, Redis services in many large companies face a large scale of requests. Therefore, simply looking at average latency is no longer enough when evaluating performance. Let me give you a simple example: let’s assume Redis processes 100 requests, and 99 of them have a response time of 1s, while one request has a response time of 100s. So, if we look at the average latency, the average latency for these 100 requests would be 1.99s. However, for the request with a response time of 100s, the corresponding user experience would be very poor. If there were 1 million requests, even if only 1% of them had a response time of 100s, there would still be 10,000 poor user experiences. These requests with high latency belong to the long-tail latency.

When I was working on a project before, I needed to keep Redis’s long-tail latency below a certain threshold. So, what would you do if you were in my shoes?

At first, I didn’t know where to start because I didn’t know what was related to long-tail latency. I had to figure it out step by step.

First of all, I analyzed Redis’s thread model and found that any blocking operation would cause long-tail latency for a single-threaded Redis. Then, I started looking for key factors that could cause blocking. At first, I thought of network congestion, but as I gained a better understanding of Redis’s network framework, I realized that Redis’s network IO uses IO multiplexing and does not block on individual clients.

Afterward, I turned my attention to key-value data structures, fork calls in the context of persistence mechanisms, AOF rewriting during master-slave synchronization, and buffer overflow, among other aspects. After going around in circles, the “chain of evidence” that influences long-tail latency finally formed. In this way, I systematically grasped the key factors that affect Redis performance, making it easier for me to solve similar problems in the future.

So, how can you efficiently develop a systematic view? Generally, we hope to achieve “fast, good, and inexpensive” when doing things, which means we want to spend less time mastering more knowledge and experience to solve more problems. It may sound difficult, but in reality, as long as you can grasp the main line and draw a panoramic knowledge map of Redis in your mind, it is entirely achievable. And this is also the approach I followed when designing this course.

So, what does the so-called panoramic knowledge map of Redis include? In simple terms, it consists of “two dimensions and three main lines.”

Panoramic knowledge map of Redis

The “two dimensions” refer to the system dimension and the application dimension, while the “three main lines” refer to high performance, high reliability, and high scalability (also known as the “three highs”).

First, from the system dimension, you need to understand the design principles of various key technologies in Redis. These can provide a solid foundation for you to judge and reason about problems. Moreover, you can also grasp some elegant system design specifications from them, such as the run-to-complete model and the epoll network model, which can be applied in your subsequent system development practices. Here is a problem: as a vast key-value database, Redis is filled with knowledge. How can we quickly know what to learn? Don’t worry, the “magic” of the “three main threads” is about to come into play.

Although the technical points may seem fragmented, you can actually categorize them according to these three main threads, just like shown in the diagram below:

  • High Performance Main Thread, including thread models, data structures, persistence, and network frameworks;
  • High Reliability Main Thread, including master-slave replication and sentinel mechanism;
  • High Scalability Main Thread, including data sharding and load balancing.

By doing this, you will have a structured knowledge framework. When you encounter these problems, you can quickly find the key factors that affect them by referring to this diagram. Isn’t this very time-saving and efficient?

Secondly, on the application dimension, I suggest that you learn in two ways: “scenario-driven” and “typical case-driven”. One is for an overview, and the other is for mastery of specific points.

We know that caching and clustering are the two widely-used application scenarios for Redis. Within these scenarios, there is an explicit technological chain. For example, when it comes to caching, you will definitely think of caching mechanisms, cache replacement, cache exceptions, and other related issues.

However, not everything is suitable for this approach. For example, Redis has a rich data model, which leads to many diverse and fragmented application scenarios. Furthermore, some problems are deeply hidden and only occur under specific business scenarios (such as scenarios with billions of access pressures), which are not common phenomena. Therefore, it is difficult for us to organize a structured framework in these cases.

In this situation, you can learn using the “typical case-driven” approach. We can focus on interpreting the usage cases that have a significant impact on Redis’s “three high” features. For example, in-depth optimization of Redis by major companies with trillion-level access volume and data volume. Analyzing these optimization practices is very helpful for you to thoroughly understand Redis. Additionally, you can also consolidate some methodologies and create a checklist like a set of strategies. In the future, when you encounter problems, you can always refer to your own “strategies” to solve them.

Lastly, I want to share a very useful technique with you. I have summarized the major typical problems I have encountered and seen in recent years related to Redis, and combined them with relevant technical points to hand-draw a “Redis problem portrait”. Regardless of the problem you encounter, you can refer to this diagram. It will allow you to quickly locate the corresponding main thread modules of Redis based on the problem, and then further pinpoint the corresponding technical points.

For example, if you encounter a slow response problem with Redis, by referring to this diagram, you will find that this problem is related to Redis’s performance main thread, which in turn is related to data structures, asynchronous mechanisms, RDB, and AOF rewriting. Once you have identified the influencing factors, solving the problem will become much easier.

Furthermore, during your learning and usage process, you can improve this portrait according to your own way. You can organize and add new knowledge points that you have practiced or mastered, following the format of “problem -> main thread -> technical point” and place them on this diagram. This way, your accumulation will increase, and the portrait will become richer. In the future, when you encounter problems, they will be easier to solve.

How is the course designed? #

What I just mentioned is actually the core design concept of our course. Next, let me talk about how this course is specifically designed.

Foundation Part: Breaking down the barriers between technical concepts and helping you build a networked knowledge structure

I will start by constructing a simple key-value database, guiding you through the process step by step. This is somewhat like building a house. Only when the main structure is established can you start thinking about “how to design it to be more beautiful and practical.” Therefore, in the “Foundation Part,” I will explain in detail the key pillars of data structures, thread models, persistence, and other foundations. This will not only help you grasp the key points but also understand their position, function, and interconnections within the overall framework. Once you understand these, you will have a solid foundation.

Practical Part: Driven by scenarios and cases, leveraging the strengths of others to develop your own “martial arts cheats”

As mentioned earlier, from the perspective of application, we need to be driven by “scenarios” and “cases” when learning. Therefore, in the “Practical Part,” I will also explain from these two aspects.

In terms of “cases,” I will introduce the reasonable use of data structures, techniques to avoid request blocking and jitter, and key skills to avoid memory competition and improve memory utilization efficiency. In terms of “scenarios,” I will focus on introducing two major scenarios: caching and clustering.

For caching, I will specifically explain the basic principles of caching, eviction strategies, as well as exceptional cases such as cache avalanche, cache penetration, and cache pollution. Regarding clustering, I will discuss optimization of clustering solutions, data consistency, high-concurrency access, and feasible solutions.

Future Part: Forward-looking insights to unlock new features

With the recent release of Redis 6.0 and its highly anticipated new features such as multi-threading, I will introduce these new features to you, as well as the latest explorations of Redis in the industry. This will provide you with a forward-looking perspective, allowing you to understand the development roadmap of Redis and be prepared for future developments. By preparing in advance, you will be able to stay ahead of others.

In addition, I will occasionally provide additional content to share with you some good operation and maintenance tools, methods for customized client development, classic learning resources, etc. I will also plan some Q&A sessions to promptly address your doubts.

Finally, I want to say that Redis is an excellent system, and its design in terms of CPU utilization, memory organization, storage persistence, and network communication is very classic. These aspects basically cover the core knowledge and key technologies that a competent backend system engineer needs to master. I hope that through this course, you will grow into an outstanding system engineer.

However, it is often difficult for a person to study alone and persist. If you have classmates who are also using Redis, I hope you can help share this course with them. You can study together and encourage each other. Feel free to leave me messages. Your encouragement is the driving force for me to continue producing quality content.