08 the Most Most Important Cluster Parameter Configuration Part 2

08 The Most Most Important Cluster Parameter Configuration Part 2 #

Today, let’s continue discussing those important Kafka cluster configurations. The second half mainly focuses on topic-level parameters, JVM parameters, and operating system parameters.

In the previous article, we talked about some rules for setting parameters on the broker side. However, Kafka also supports setting different parameter values for different topics. The latest version, 2.2, provides a total of about 25 topic-level parameters. Of course, we don’t need to understand all of their functions. Here, I have selected some of the most crucial parameters that you must grasp clearly. In addition to topic-level parameters, I will also provide some important JVM parameters and operating system parameters. Correctly setting these parameters is a key factor in building a high-performance Kafka cluster.

Topic-level Parameters #

Speaking of Topic-level parameters, you may wonder: if both Topic-level parameters and global Broker parameters are set, whose rules should be followed? Which one is the final authority? The answer is that Topic-level parameters override the values of global Broker parameters, and each Topic can have its own parameter values, which is the so-called Topic-level parameters.

Let me explain with an example. In the previous issue, I mentioned the parameter for the retention time of message data. In the actual production environment, it is neither efficient nor necessary to keep data for all Topics for a very long time. A more appropriate approach is to allow each department’s Topics to set their own retention time based on their business needs. If only global Broker parameters can be set, it will inevitably require extracting the maximum value of retention time for all businesses as the global parameter value. In this case, setting Topic-level parameters to override it would be a good choice.

Now let’s introduce important Topic-level parameters grouped by purpose. In terms of message retention, the following parameters are very important:

retention.ms: Specifies the duration for which messages in the Topic are retained. The default is 7 days, which means the Topic only retains the most recent 7 days of messages. Once this value is set, it will override the global parameter value on the Broker side.
retention.bytes: Specifies the amount of disk space to reserve for the Topic. Similar to global parameters, this value is particularly useful in multi-tenant Kafka clusters. The default value is -1, which means unlimited disk space usage.

The above parameters are from the perspective of message retention. If we consider the size of messages that can be processed, there is one parameter that must be set, which is max.message.bytes. It determines the maximum message size that the Kafka Broker can normally receive for the Topic. I know that many companies are currently using Kafka as a fundamental infrastructure component, running various business data on it. If we cannot provide a suitable maximum message value at the global level, it becomes necessary for different business departments to set this Topic-level parameter themselves. In actual scenarios, this usage is indeed very common.

So, these are the few Topic-level parameters you need to master. Now let me explain how to set Topic-level parameters. To be honest, I have a personal opinion about this: I personally do not favor the design approach that provides many choices for doing one thing. It may seem like giving users multiple choices, but in reality, it only increases the learning cost for users. Especially for system configurations, if you tell me that there is only one way to do it, I will work hard to learn it; but if you tell me that there are two or even multiple methods to achieve it, then I may lose interest in learning any of them. Setting Topic-level parameters is such a case. We have two ways to set them:

Setting them during Topic creation
Setting them when modifying the Topic

Let’s first see how to set these parameters when creating a Topic. I will use the retention.ms and max.message.bytes that I mentioned earlier as examples. Imagine that your department needs to send transaction data to Kafka for processing and needs to retain the latest half-year transaction data. At the same time, these data are usually large, typically several MB, but generally not exceeding 5MB. Now let’s create the Topic using the following command:

bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic transaction --partitions 1 --replication-factor 1 --config retention.ms=15552000000 --config max.message.bytes=5242880

You only need to know that Kafka provides the kafka-topics command for us to create Topics. For the above command, pay attention to the --config settings at the end. We are specifying the Topic-level parameters we want to set after config.

Next, let’s see how to modify Topic-level parameters using another built-in command, kafka-configs. What if we now want to send messages with a maximum size of 10MB? The command is as follows:

bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name transaction --alter --add-config max.message.bytes=10485760

In general, you can only use these two methods to set Topic-level parameters. My personal suggestion is that you should always insist on using the second method to set them, and in the future, the Kafka community is likely to unify the usage of the kafka-configs script to adjust Topic-level parameters.

JVM Parameters #

As mentioned in the previous section, the Kafka server-side code is written in Scala but ultimately compiled into Class files and run on the JVM. Therefore, the importance of JVM parameter settings for a Kafka cluster is self-evident.

First, let’s talk about the Java version. Personally, I strongly recommend not running Kafka in a Java 6 or 7 environment. Java 6 is outdated, and there is no reason not to upgrade to a newer version. Additionally, starting from Kafka 2.0.0, support for Java 7 has been officially discontinued. So if possible, at least use Java 8.

Speaking of JVM settings, the heap size parameter is crucial. Although we will discuss how to tune Kafka performance later on, for now, I would like to give a general recommendation: set your JVM heap size to 6GB. This is currently considered a reasonable value in the industry. I have seen many people running Kafka with the default heap size, which is a bit small. After all, the Kafka broker creates a large number of ByteBuffer instances on the JVM heap when interacting with clients, so the heap size should not be too small.

Another important JVM configuration parameter is the garbage collector (GC) settings. If you are still using Java 7, you can choose an appropriate GC based on the following guidelines:

If the CPU resources on the machine where the broker is located are abundant, it is recommended to use the CMS collector. You can enable it by specifying -XX:+UseCurrentMarkSweepGC.
Otherwise, use the throughput collector. You can enable it by specifying -XX:+UseParallelGC.

Of course, if you are using Java 8, you can manually set the G1 collector. Without any tuning, G1 performs better than CMS, mainly in terms of fewer full GCs and fewer parameters that need adjustment. So using G1 is good enough.

Now that we have determined the JVM parameters to set, how do we set them for Kafka? Strangely, this question is not mentioned on the Kafka official website. In fact, the method of setting these parameters is quite simple. You just need to set the following two environment variables:

KAFKA_HEAP_OPTS: Specifies the heap size.
KAFKA_JVM_PERFORMANCE_OPTS: Specifies the GC parameters.

For example, you can start the Kafka broker by setting these two environment variables before starting it:

$> export KAFKA_HEAP_OPTS=-Xms6g -Xmx6g
$> export KAFKA_JVM_PERFORMANCE_OPTS=-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true
$> bin/kafka-server-start.sh config/server.properties

Operating System Parameters #

Finally, let’s discuss the operating system parameters that are usually required to be set for a Kafka cluster. In general, Kafka does not require many OS parameters to be set, but it is still advisable to pay attention to certain factors, such as the following:

File descriptor limit
File system type
Swappiness
Commit time

Firstly, let’s talk about ulimit -n. I believe it is better to adjust this value for any Java project. In reality, file descriptors are not as expensive as we imagine, so you don’t have to worry too much about increasing this value. Generally, it is reasonable to set it to a large value, such as ulimit -n 1000000. Do you remember the dialogue from the movie “Let the Bullets Fly,” “Who is more important to me, you or money? Neither, you are important to me!”? This parameter is somewhat similar. Actually, setting this parameter is not important at all, but not setting it can have serious consequences, such as frequently encountering the “Too many open files” error.

Secondly, the choice of file system type. Here, we are referring to log-based file systems such as ext3, ext4, or XFS. According to the official test report, XFS performs better than ext4, so it is preferable to use XFS in a production environment. By the way, there is a recent report on Kafka using ZFS which seems to provide even better performance. If you have the opportunity, you can give it a try.

Thirdly, optimizing swap. Many articles online mention setting it to 0, completely disabling swap to prevent Kafka processes from using swap space. Personally, I think it is better not to set it to 0. We can set it to a smaller value. Why is that? If it is set to 0, when physical memory is exhausted, the operating system will trigger the OOM killer component. It randomly selects a process to kill, without giving users any warning. However, if a small value is set, when swap space starts to be used, you can at least observe a sharp decline in the broker’s performance, providing you with additional time for further optimization and troubleshooting. Considering this, my personal recommendation is to configure swappiness to a value close to 0 but not 0, such as 1.

Finally, there is the commit time or Flush time. Sending data to Kafka does not require waiting for the data to be written to disk to be considered successful. It is sufficient for the data to be written to the operating system’s page cache. Subsequently, based on the LRU algorithm, the operating system periodically flushes the “dirty” data from the page cache to the physical disk. The frequency of this flushing is determined by the commit time, which is set to 5 seconds by default. In general, we may consider this time interval to be too frequent and can increase the commit interval to reduce disk write operations. However, you may have the following question: if the data in the page cache is not written to disk and the machine crashes, wouldn’t the data be lost? Indeed, in such a scenario, the data will be lost. But considering that Kafka already provides a redundant mechanism at the software level, the trade-off of slightly increasing the commit interval for better performance is still a reasonable approach.

Summary #

Today I shared with you various configurations for Kafka cluster settings, including Topic-level parameters, JVM parameters, and operating system parameters. Together with the previous article, they form a complete list of Kafka parameter configurations. I hope these best practices will help you when setting up a Kafka cluster, but remember that configurations may vary depending on the environment. Be sure to validate their effectiveness by considering your own business needs and conducting specific tests.

Open Discussion #

Many people argue that Kafka does not need a large heap memory set for the broker and that memory should be allocated to page cache as much as possible. What are your thoughts on this? Are there any good rules to evaluate Kafka’s memory usage in your practical experience?

Feel free to write down your thoughts and answers, and let’s discuss together. If you find it helpful, please share this article with your friends.