17 How to Understand and Choose Core Data Types When Using Redis

17 How to Understand and Choose Core Data Types When Using Redis #

Hello, I am your cache course teacher, Chen Bo. Welcome to Lesson 17: “Redis Data Types”.

Redis Data Types #

Firstly, let’s take a look at the core data types in Redis. Redis has 8 core data types, which are:

string: the string type;
list: the list type;
set: the set type;
sorted set: the sorted set type;
hash: the hash type;
bitmap: the bitmap type;
geo: the geolocation type;
HyperLogLog: the probabilistic data structure type.

The string type #

The string type is the most basic data type in Redis. It can be understood as the value type corresponding to the key in Memcached. The string type in Redis is binary-safe, meaning it can contain any data.

Ordinary strings in Redis use raw encoding, which dynamically expands and reduces memory allocation overhead by pre-allocating redundant space.

When the string length is less than 1MB, double the required length is allocated, and when it exceeds 1MB, an additional capacity increase of 1MB is allocated each time.

Numbers stored in Redis are also considered as string types, but they use a different encoding scheme than ordinary strings. Numbers are encoded as integers, and the string content is directly set as the binary byte sequence of the integer value.

The string type in Redis can be used for storing ordinary strings, serializing objects, and counters, among other scenarios. The corresponding commands for working with the string data type include set, get, mset, incr, decr, and so on.

The list type #

The list type in Redis is a fast doubly-linked list that stores a series of string values. Elements in the list are arranged in the order of insertion. Elements can be inserted into the list by using lpush to insert one or more elements at the head, rpush to insert one or more elements at the tail, or lset and linsert to insert elements at a specified position or before/after a specified element.

To retrieve elements from the list, you can use lpop and rpop to remove elements from the head or tail, respectively. If the list is empty, nil is returned. In addition, you can use Blpop and Brpop to perform blocking pop operations from the head or tail. If the list is empty and there are no elements to pop, it will block until a new element is inserted by another client. The blocking pop operation can be configured with an expiration time to avoid indefinite waiting. Lastly, you can use Lrange to retrieve all elements within a specified range in the list. In Redis, the offset positions in the list are 0-based, meaning the first element has an index of 0, the second has an index of 1, and so on. Negative offsets can also be used, with -1 referring to the last element, -2 referring to the second-to-last element, and so on.

For normal operations such as pushing or popping elements to and from the list, the performance is high and the time complexity is O(1) because the list is directly appended or popped. However, for operations such as random insertion, deletion, or range retrieval, which require polling the list to determine the position, the performance is relatively poor.

A list data structure can be used for storing a feed timeline, where feed IDs usually increase sequentially and can be directly stored in a list. Additionally, list data structures can be used for scenarios such as message queues and popular feeds.

When working with lists, you can use lpush, lpop, rpush, rpop, and lrange for regular queueing, dequeueing, and range retrieval operations. In certain special scenarios, lset and linsert can be used for random insertion, and lrem can be used for removing specified elements. Finally, for consuming message lists, Blpop and Brpop can be used to blockingly retrieve elements, allowing clients to wait quietly when the list temporarily doesn’t have any elements without continuously querying.

The set type #

The set type in Redis is an unordered collection of string elements where each element is unique. Sets in Redis are implemented using a hash table, so inserting, deleting, and querying elements can be done in constant time O(1) based on the hash value of the element.

In addition to regular operations such as adding, deleting, and finding elements, the following commands can be used to manipulate sets:

The Sismember command checks whether a certain element exists in the set associated with a given key. It returns 1 if the element exists and 0 otherwise.
The Sdiff command performs set difference on multiple sets.
The Sinter command performs set intersection on multiple sets.
The Sunion command performs set union on multiple sets.
The Spop command removes a random element from the set.
The Srandmember command returns one or more random elements from the set.

The set type has the advantage of being highly efficient for lookup, insertion, and deletion operations, all with a time complexity of O(1). Therefore, in social systems, sets can be used to store lists of friends that a user is following to determine whether a user is following someone, as well as for friend recommendation. Additionally, the uniqueness of sets can be leveraged for accurate statistics on the source business and source IP of services.

The sorted set type #

The sorted set type in Redis, also known as zset, is a collection of string elements similar to the set type. However, each element in the sorted set is associated with a double precision floating point score value. The sorted set is sorted by the score value from low to high. While duplicate elements are not allowed in the sorted set, duplicate score values are allowed.

In addition to regular operations such as adding, deleting, and finding elements, the following commands can be used to manipulate sorted sets:

The zscan command: retrieves elements in a sorted set in order.
The zscore command: retrieves the score value of an element.
The zrange command: retrieves elements within a specified score range.
When the score value of an element changes, it can be incremented or decremented using the zincrby command.
The zinterstore and zunionstore commands can be used to compute the intersection and union of multiple sorted sets. The result is stored in a new key, and if there are duplicate elements, their scores are added together.

Sorted sets have the following characteristics:

All elements are sorted by score in non-repeating order.
Efficient lookup, insertion, and deletion with a time complexity of O(1).

Therefore, sorted sets can be used to create leaderboards, update them in real-time, record student grades to easily obtain a list of students within a specific score range, and add weight values to system statistics for real-time display on a dashboard.

Hash #

In Redis, a hash is actually a mapping table between fields and values.

The characteristics of a hash data structure are that a single key can contain multiple key-value pairs (fields and values), where the value can be any string. The operations for querying and modifying these key-value pairs are efficient.

Therefore, hashes can be used to store complex objects with multiple elements, and then perform specific operations on each element. Some important commands for hash structures include: hmset, hmget, hexists, hgetall, hincrby, etc.

The hmset command is used to insert multiple field-value mappings.
The hmget command retrieves values corresponding to multiple fields.
The hexists command checks if a field exists.
If the value corresponding to a field is an integer, the hincrby command can be used to increment or decrement the value.

Bitmap #

In Redis, a bitmap is a sequence of continuous binary digits, which is actually stored and operated upon as a string. Bitwise operations are performed on the individual bits of the bitmap. The position of each bit corresponds to an offset. The setbit and bitfield commands can be used to set a bit to 0 or 1 in a bitmap. The bitcount command can be used to count the number of bits that are set to 1 in a bitmap. The bitop command can be used to perform logical operations such as AND, OR, and XOR on multiple bitmaps.

The characteristic of bitmap is that bitwise operations such as setting bits, performing AND and OR operations are highly efficient, and the storage cost is very low. When used to store object label attributes, one bit is enough to store one label. Bitmap can be used to store the login status of users in the past N days, with 1 bit for each day and setting it to 1 if the user logs in. Personalized recommendations are very important in social applications. A series of labels can be set for news and feeds, such as military, entertainment, video, pictures, text, etc. These labels can be stored using bitmap, with 1 set on the corresponding label bit. For users, a similar method can be used to record multiple attributes of users, and it is easy to perform multidimensional statistics based on labels. The important instructions for bitmap include: setbit, getbit, bitcount, bitfield, bitop, bitpos, etc.

In the era of mobile social networking, there are more and more LBS (Location-Based Services) applications, such as “People Nearby” in WeChat and Momo, “Nearby Food” and “Movie Theater” in Meituan and Dianping, “Nearby Cars” in Didi and Uber, etc. To implement these functions, geographical location information needs to be used for searching. The geographical location of the earth is represented using two-dimensional latitude and longitude. By determining the latitude and longitude of a point, its position on the earth can be determined.

Redis has added support for GEO geographical locations in version 3.2. Redis GEO locations are essentially implemented based on sorted sets. When storing geographical location information under a category key, a sorted set needs to be built as the internal storage structure for that category key to store a series of location points.

When storing a certain location point, the two-dimensional latitude and longitude of the location are first encoded into a one-dimensional 52-bit integer value using the Geohash algorithm. The location name and encoded score (latitude and longitude) are then stored as key-value pairs in the sorted set corresponding to the category key.

To calculate the people near a certain location point A, the GEO hash ranges of 8 directions are determined with location A as the center point and distance as the radius. Then, all location points within these hash ranges are iterated, and if the distance from these location points to the center location A is within the required distance range, they are considered as target location points. After iterating through all the location points within the ranges, they are reordered to obtain all the targets near location point A.

Use geoadd to add the name of a location (such as person, vehicle, store name) and its corresponding geographical location information to a specified location category key.
Use geopos to easily query the location information of a certain name.
Use georadius to obtain all elements within a specified distance near a specified location.
Use geodist to get the distance between two specified locations.

With these functionalities, can’t we implement finding nearby restaurants and calculating the distance from the current location to the corresponding restaurant?

Redis GEO geographical location, using Geohash to convert a large number of two-dimensional latitude and longitude coordinates into one-dimensional integer values, makes it easy to query geographical locations, measure distances, and search within a range. However, due to the large number of geographical location points, there may be a large number of elements under a geographic category key. When designing GEO, it is necessary to plan ahead to avoid excessive expansion of a single key.

Redis GEO geographical location data structure has many applications, such as querying the specific location of a certain place, calculating the distance from the current location to the destination, and finding nearby people, restaurants, movie theaters, etc. The important instructions in the GEO geographical location data structure include geoadd, geopos, geodist, georadius, georadiusbymember, etc.

hyperLogLog cardinality estimation #

Redis hyperLogLog is a data type used for cardinality estimation. When counting a huge number of elements, it only requires a small amount of memory. HyperLogLog does not store the actual elements but only estimates the approximate quantity of the elements to be counted. This estimated quantity has a standard deviation of 0.81% and is acceptable for most business scenarios when dealing with massive data.

When using Redis HyperLogLog for estimation, if the count is not large, a sparse matrix is used for storage. As the count increases, the space occupied by the sparse matrix will gradually increase. When it exceeds a threshold, it switches to a dense matrix with fixed space, which is about 12KB in size.

With the hyperLogLog data type in Redis, you can use pfadd to add new elements to the cardinality estimation, pfcount to obtain the approximate cardinality count stored in the hyperLogLog structure, and hypermerge to merge multiple hyperLogLogs into one, making it easy to get the merged cardinality count.

The characteristic of hyperLogLog is that it does not store individual elements during the estimation process and occupies very little memory, making it very suitable for estimating massive data. In medium and large-scale systems, it can be used to calculate the daily or monthly unique visitors (UV), or to count the number of unique search terms searched by a large number of users using hyperLogLog data type.