03 Factors to Consider When Designing Cache Architecture

03 Factors to Consider When Designing Cache Architecture #

Hello, I am your cache teacher, Chen Bo. Welcome to the 3rd lesson on “Introduction to Cache and Architecture Design”.

By now, we have covered the main principles of caching. Next, we will discuss how to introduce caching and design the architecture, as well as some key considerations in cache architecture design.

Introduction to Cache and Architecture Design #

Choosing a Cache Component #

When designing a cache architecture, the first step is to select a cache component. For example, you may choose a local cache, or open-source cache components such as Redis, Memcached, or Pika. If your business cache requirements are more specific, you also need to consider whether to develop a custom cache component or customize an existing open-source cache component to meet your needs.

Cache Data Structure Design #

Once you have chosen the cache component, you need to design the cache data structure based on the characteristics of your business access. For simple key-value (KV) read and write operations, you can encapsulate the business data as String, JSON, Protocol Buffer, or other formats, serialize them into byte sequences, and write them directly into the cache. When reading, you can retrieve the byte sequence data from the cache component and then perform deserialization. For businesses that only need to access specific fields or require computation on the cache side, you can design the data as hash, set, list, geo, or other structures and store them in caches that support complex collection data types, such as Redis, Pika, etc.

Cache Distribution Design #

After determining the cache component and designing the cache data structure, the next step is to design the cache distribution. Cache distribution can be performed from three dimensions.

Firstly, you need to choose a distributed algorithm, such as modulus or consistent hash, for distribution. The modulus distribution scheme is simple, with each key existing in a specific cache node. The consistent hash distribution scheme is relatively complex, with an uncertain cache node for each key. However, with consistent hash distribution, when some cache nodes become unavailable, the data access of the failed nodes can be evenly distributed to other nodes that are still operational, thus better ensuring the stability of the cache system.
Secondly, how to implement distributed read and write access. Should the cache client directly perform hash-based distribution for read and write, or should it use a proxy for read and write routing? Direct client read and write achieve the best performance, but the client needs to be aware of the distribution strategy. In the event of online changes in cache deployment, all cache clients need to be promptly notified to avoid read and write exceptions. Additionally, the client implementation is more complex. On the other hand, with proxy routing, the client only needs to access the proxy directly, and the distribution logic and deployment changes are handled by the proxy. This approach is the most friendly to business application development, but it adds an extra hop in terms of access performance.
Lastly, if the amount of data to be cached grows rapidly, a large amount of cached data may be evicted, leading to a decrease in cache hit rate and a decline in data access performance. In this case, you need to dynamically split the data from the cache node and migrate some of the data horizontally to other cache nodes. This migration process needs to consider whether it is done by the proxy or the cache server itself, or whether it is even supported at all. For Memcached, migration is generally not supported. For Redis, the community version relies on the cache server itself for migration, and for Codis, migration is done through Admin and Proxy, in combination with the backend cache component.

Cache Architecture Deployment and Operations Management #

After designing the cache distribution strategy, the next consideration is the deployment and operations management of the cache. Cache architecture deployment mainly considers how to split, layer, and distribute caches across different resource pools, layers, and IDCs, as well as whether heterogeneous processing is needed.

Different data that is core and highly accessed needs to be split into separate cache pools to prevent mutual interference. Less accessed, non-core business data can be stored together.
For massive data and business data with access exceeding the scale of 10-100 million, you need to consider layered access and distribute the access load to avoid cache overload.
If the business system requires deployment across multiple IDCs or even cross-site active-active, the cache system also needs to be deployed across multiple IDCs. It is necessary to consider how to update the cache data across IDCs. This can be achieved by directly reading and writing across IDCs, or by using DataBus and queue machines to synchronize messages between different IDCs, and then have the message processing machine update the cache. Cache updates can also be triggered by DB triggers in each IDC.
In some extreme scenarios, it is necessary to combine multiple cache components and use cache heterogeneity to achieve optimal read and write performance.
From a system perspective, to better manage the cache, it is also necessary to consider the service-oriented cache, and consider how the cache system can be better managed, monitored, and maintained.

Common Considerations in Cache Design Architecture #

During the process of cache design architecture, there are some very important considerations, as shown in the following figure. Only by analyzing these considerations clearly can a better cache system be designed.

Read and Write Operations #

The first consideration is the method of reading and writing values. Should it be done as a whole or in parts? Is internal computation required? For example, for the number of followers a user has, many ordinary users may have several thousand to tens of thousands of followers, while influencers may have millions or even billions of followers. Therefore, retrieving the entire list of followers cannot be done as a whole, but only in parts. Similarly, when determining if a user follows another user, there is no need to retrieve the entire list of followers for that user. Instead, a more efficient method would be to check and return True/False or 0/1 directly from the list of followers.

KV Size #

Next, consider the size of the key-value (KV) pairs for different business data caches. If the size of KV pairs for a single business is too large, it needs to be divided into multiple KV pairs for caching. However, if the KV sizes for different cached data differ too much, they should not be cached together to avoid low caching efficiency and mutual interference.

Number of Keys #

The number of keys is also an important factor to consider. If the number of keys is small, the entire dataset can be stored in the cache, using the cache as a database. If a cache lookup results in a miss, it means the data does not exist and there is no need to query the database again. However, if the data volume is huge, only frequently accessed hot data should be retained in the cache, and cold data should be directly accessed from the database.

Read and Write Peaks #

In addition, the read and write peaks of the cache data are important considerations. If the read and write peaks are less than 100,000 QPS, simply splitting them into independent cache pools is sufficient. However, once the read and write peaks exceed 100,000 or even reach 1 million QPS, the cache needs to be layered. It is possible to use local cache in combination with remote cache, or even further layer the remote cache for processing. In Weibo’s business, most of the core services use this approach to access Memcached.

Cache Hit Rate #

The cache hit rate has a significant impact on the overall performance of the service system. For core high-concurrency businesses, sufficient capacity needs to be allocated to ensure a high hit rate for the core business cache. For example, the Feed Vector Cache in Weibo maintains a hit rate of over 99.5% year-round. To continuously maintain the cache hit rate, the cache system needs to be continuously monitored, and faults need to be promptly handled or transferred. In cases where some cache nodes are abnormal and the hit rate drops, fault transfer plans need to be considered. This can include using a consistent hash-based access drift strategy or a multi-layer data backup strategy.

Expiration Strategy #

You can set a short expiration time to automatically expire cold keys;
You can also add a timestamp to the key and set a longer expiration time. For example, many business systems have keys like “key_20190801”.

Average Cache Penetration Loading Time #

The average cache penetration loading time is also important in some business scenarios. For some data with long loading time or complex calculations after cache penetration, and with relatively high access volume, you need to configure more capacity, maintain a higher hit rate, and reduce the probability of penetration to the database in order to ensure the overall system’s access performance.

Cache Manageability #

For cache manageability considerations, you need to consider cluster management of the cache system, such as how to perform one-click scaling, how to upgrade and modify cache components, how to quickly discover and locate problems, and how to continuously monitor and alert. It is best to have a comprehensive operation and maintenance platform that integrates various operation and maintenance tools.

Cache Security #

For cache security considerations, on the one hand, you can restrict the source IP and only allow internal network access. At the same time, for some critical instructions, you need to increase access permissions to avoid major consequences caused by attacks or accidental operations.

Alright, that’s the end of Lesson 3. Let’s review what we have learned together. First, we learned how to introduce caching in system development and how to design architecture and manage caching in 4 steps. Finally, we became familiar with the considerations in cache design architecture. So now, you can design cache architecture with confidence.