13 How to Thoroughly Learn the MC Protocol and Optimize Client Access

13 How to Thoroughly Learn the MC Protocol and Optimize Client Access #

Hello, I am your caching class teacher, Chen Bo, welcome to Lesson 13 “Memcached Protocol Analysis”.

Protocol Analysis #

Exceptional Error Response #

Next, let’s fully learn the MC protocol. Before learning the MC protocol, let’s first take a look at how Mc handles protocol instructions and responds to exceptional errors. When handling all client-side instructions, if Mc encounters an error, it will return one of three types of error messages.

The first type of error is a protocol error, which is a string “ERROR\r\n”. This indicates that the client has sent an illegal command.
The second type of error is a client error, in the format of “CLIENT_ERROR \r\n”. This error message indicates that the format of the protocol command sent by the client is incorrect, such as missing fields or having illegal fields.
The third type of error is “SERVER_ERROR \r\n”. This error message indicates an error that occurred on the Mc server side while processing the command. For example, if the server fails to allocate Item space for the key/value, it will return the error message “SERVER_ERROR out of memory storing object”.

Storage Protocol Commands #

Now let’s take a look at the storage protocol of Mc. The storage protocol commands of Mc are not many, there are only 6 of them.

The storage commands of Mc are divided into two lines. The first line is the message header, and the second line is the data block of the value. These two parts are separated and terminated by \r\n.

The header line of the storage class commands can be in two formats. One format is the cmd storage command, followed by the key, flags, expiretime, value byte count, and an optional noreply.

Among them, flags is a user-designed special-meaning number. Mc only stores flags without any additional parsing and processing. Expiretime is the expiration time of the key. Value byte count is the length of the value block, and including noreply means that Mc will silently process without returning any response to the client after processing.

These cmd commands include the most commonly used set command, as well as add, replace, append, reppend. There are a total of 5 commands:

The Set command is used to store a key/value pair.
The Add command stores the key/value pair only when the key does not exist.
The Replace command stores the key/value pair only when the key exists.
The Append command appends data to the end of the value when the key exists.
The Prepend command adds data to the beginning of the value when the key exists.

The other type of storage protocol command is similar in format and fields to the previous one, but with an additional cas unique id. This format is only used by the cas command. The cas command is used to modify the value only when the key exists and has not been modified by anyone else since it was acquired by this client. “cas” stands for compare and set, meaning setting after successful comparison.

Storage Command Response #

When responding to storage protocol commands, if Mc encounters an error, it will return one of the three types of error messages mentioned earlier. Otherwise, it will return one of the following four normal responses: “STORED\r\n”, “EXISTS\r\n”, “NOT_STORED\r\n”, “NOT_FOUND\r\n”.

Among them, “STORED” indicates that the storage modification is successful. “NOT_STORED” indicates that the data was not stored successfully, but there was no error or exception encountered. This response generally indicates that the add or replace command does not meet the preconditions. For example, for the add command, if the key already exists in Mc, the add operation will fail. When using the replace command, if the key does not exist, the replace operation will also fail. “EXISTS” indicates that the key to be cas’d has been modified, while “NOT_FOUND” means the key to be cas’d does not exist in Mc.

You can refer to the following mind map to have a complete impression of the request and response protocols for storage commands in Mc.

Get Command #

The protocol for obtaining Mc includes only two instructions: get and gets, as shown in the figure below. The format is to follow the “get/gets” with several keys, and then end the request command with \r\n. The get command only retrieves the flag and value of the key, while gets additionally retrieves a cas unique id value. Gets is mainly used for the cas instruction.

The response to the retrieval command is the value string, followed by the key, flag, value byte size, and the data block of the value. Finally, it ends with “END\r\n” to indicate that all existing key/values have been returned. If there are no returned keys, it means that the key does not exist in Mc.

Other instructions #

The other protocol instructions for Mc include delete, incr, decr, touch, gat, gats, slabs, lru, and stats.

The delete instruction is used to delete a key.

The incr/decr instructions are used to increment or decrement an unsigned long integer.

The touch, gat, and gats instructions were added to Mc later and can be used to modify the expiration time of a key. The difference is that touch only modifies the expiration time of the key without retrieving the corresponding value.

The gat and gats instructions not only modify the expiration time of the key, but also retrieve the flag and value data. Gats is the same as gets, but additionally retrieves the cas unique id value.

Slabs reassign is used to reassign slabs among different slab classes in Mc after it reaches the set memory limit. This can avoid the randomness generated by the automatic allocation at Mc startup, ensuring better hit rate for data of special sizes. Slabs automove is a switch instruction that, when enabled, allows the Mc background threads to decide when to reassign slabs among slab classes.

The lru instruction is used for setting and tuning Mc LRU. For example, LRU tune is used to set the memory ratio for HOT and WARM LRU. LRU mode is used to set whether Mc only uses COLD LRU or the new strategy with 4 LRU. LRU TEMP_TTL is used to set the TTL value for Mc’s TEMP LRU, which is 61s by default. Keys with a TTL less than TEMP_TTL will be inserted into TEMP LRU.

Stats is used to obtain various statistics of Mc. After stats, additional parameters such as statistics, slabs, and size can be used to obtain more detailed statistics.

Client Usage #

Mc is widely used in internet companies, and there are implementations of Mc clients in popular languages. Taking Java as an example, widely used Mc Java clients include Memcached-Java-Client, SpyMemcached, and Xmemcached.

Memcached-Java-Client was introduced early and was widely used 10 years ago. This client has average performance but is stable enough, and many internet companies still use it today. However, this client stopped updating several years ago.

SpyMemcached appeared relatively late and has better performance but lacks stability in high-concurrency access scenarios. There have been very few changes recently, and updates have basically stopped.

Xmemcached has good performance and the best overall performance. Moreover, it has a highly active community and has been continuously updated in recent years. For new Java projects, it is recommended to use Xmemcached.

When using Mc clients, there are some general optimization and improvement strategies. For example, if the key/value being read or written is large, a larger buffer should be set to improve performance. In some business scenarios, TCP_NODELAY should be enabled to avoid the 40ms latency problem. Additionally, if the size of the key/value being stored or accessed is large, a compression threshold can be set. If the size exceeds the threshold, the value will be compressed using a compression algorithm to reduce space usage for reading, writing, and storage.

To avoid cache avalanche and better handle hot keys and flood traffic, the Mc client can be encapsulated and multiple replicas and hierarchical strategies can be added. This allows the Mc caching system to achieve high availability and high performance in any scenario.

With this, the core knowledge points of Mc have been covered. The structure of these knowledge points is shown in the following figure.

Let’s review the content of the recent lessons. First, we learned about the system architecture of Mc, the network model based on libevent, and how the main thread and worker threads coordinate and handle network IO. We also learned about the state machine of Mc. Then, we continued to learn about the hash table used for key location in Mc, the LRU for managing data lifecycles, the slab allocation mechanism, and the storage mechanism of Mc data. Finally, we completed the study of Mc’s protocol, learned about three Mc clients in Java, and how to optimize and improve the Mc client in the production environment.

According to the mind map of Mc protocol below, check if you understand all the instructions. You can also start a Mc instance and practice each command with the Mc protocol document.