37 Mainstream Kafka Monitoring Frameworks

37 Mainstream Kafka Monitoring Frameworks #

Hello, I’m Hu Xi. Today I want to share with you the topic of popular Kafka monitoring frameworks.

In the previous lecture, we focused on discussing how to monitor Kafka clusters, mainly discussing monitoring principles and methods. Today, let’s talk about specific monitoring tools or monitoring frameworks.

It is somewhat regrettable that the Kafka community seems to have not invested too much effort in monitoring frameworks. Currently, there are over 500 proposed new features for Kafka, but none of them are related to monitoring frameworks. Of course, Kafka does provide a large number of JMX metrics, however, viewing these JMX metrics individually is often not very convenient, so we still rely on frameworks to provide performance monitoring in a unified manner.

Perhaps it is due to this “inaction” of the community that many companies and individuals have taken it upon themselves to develop Kafka monitoring frameworks, and there is no shortage of outstanding ones among them. Today, let’s comprehensively summarize the popular monitoring frameworks.

JMXTool Tool #

First of all, I would like to recommend the JMXTool tool to you. Strictly speaking, it is not a framework, but just a tool that comes with the community. The JMXTool tool allows you to monitor Kafka JMX metrics in real time. If you can’t find a suitable framework for monitoring for the time being, JMXTool can help you in emergency situations.

There is no introduction to JMXTool on the Kafka official website. You need to run the following command to get a complete introduction to its usage:

bin/kafka-run-class.sh kafka.tools.JmxTool

JMXTool provides many parameters, but you don’t need to understand all of them. I have listed the main parameter explanations in the following table. At least you need to understand the meanings of these parameters.

Now, let me give you a practical example to explain how to run this command.

Assume that you want to query the incoming traffic per second on the broker side, which is also called the JMX metric BytesInPerSec. This JMX metric can help you check the incoming traffic load on the broker side. If you find that this value is close to your network bandwidth, it indicates that the incoming load on this broker is too high. You need to reduce the load on this broker or transfer some of the loads to other brokers.

The following command means querying the average value of BytesInPerSec for the past one minute every 5 seconds.

bin/kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec --jmx-url service:jmx:rmi:///jndi/rmi://:9997/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --attributes OneMinuteRate --reporting-interval 1000

In this command, there are several points you need to pay attention to:

When setting the value of the --jmx-url parameter, you need to specify the JMX port. In this example, the port is 9997. In actual operation, you need to specify the port in your environment.
Since I am running the command directly on the broker side, I ignored the hostname. If you run this command on another machine, you need to remember to specify the hostname you want to connect to.
You can directly query the complete syntax of the --object-name parameter value on the Kafka official website. As we mentioned earlier, Kafka provides many JMX metrics, so you need to learn their usage on the official website. Let me take the ActiveController JMX metric as an example to explain the learning method. You can search for the keyword “ActiveController” on the official website to find its corresponding --object-name, which is kafka.controller:type=KafkaController,name=ActiveControllerCount. With this, you can execute the following script to check the number of active controllers currently.

$ bin/kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.controller:type=KafkaController,name=ActiveControllerCount --jmx-url service:jmx:rmi:///jndi/rmi://:9997/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --reporting-interval 1000

Overall, JMXTool is a small tool that comes with the community. It can handle general and simple monitoring scenarios, but it has limited functionality. For complex monitoring solutions, you still need to rely on monitoring frameworks.

Kafka Manager #

When it comes to Kafka monitoring frameworks, the most famous one is Kafka Manager. Kafka Manager is an open-source Kafka monitoring framework developed by Yahoo in 2015. This framework is developed in Scala and is mainly used for managing and monitoring Kafka clusters.

It can be said that Kafka Manager is currently the best among many Kafka monitoring tools. Whether it is the richness of the interface display or the completeness of monitoring functions, it is second to none. However, this framework has not been updated for 4 months, and it only has three or four active code maintainers. Therefore, many bugs or issues cannot be fixed in a timely manner. Moreover, it cannot keep up with the pace of Apache Kafka version updates.

Currently, the latest version of Kafka Manager is 2.0.0.2. After downloading the tar.gz package from its official GitHub website, we can execute the decompression and obtain the kafka-manager-2.0.0.2 directory.

After that, we need to run the sbt tool to compile Kafka Manager. sbt is a compilation and build tool specifically designed for Scala projects, similar to the well-known Maven and Gradle. Kafka Manager comes with sbt commands, so we can simply run it to build the project:

./sbt clean dist

After a long wait, you should see that the project has been successfully built. You can find the generated zip file in the target/universal directory of Kafka Manager. Unzip it and then modify the kafka-manager.zkhosts item in the conf/application.conf file to point to the ZooKeeper address in your environment, for example:

kafka-manager.zkhosts="localhost:2181"

Afterwards, run the following command to start Kafka Manager:

bin/kafka-manager -Dconfig.file=conf/application.conf -Dhttp.port=8080

This command specifies the configuration file to be read and the listening port to be used for startup. Now, open the browser and enter the corresponding IP:8080 to access Kafka Manager. The following image shows the main interface of Kafka Manager after adding a cluster.

Note that you should check “Enable JMX Polling” so that you can monitor various JMX metrics of Kafka. The following image shows the main interface of Kafka Manager.

From this image, we can see that Kafka Manager clearly lists the number of topics, brokers, and other information of the currently monitored Kafka cluster. You can click on various items in the top menu to explore other functionalities.

In addition to its rich monitoring capabilities, Kafka Manager also provides many operational management operations, such as creating topics, conducting Preferred Leader elections, etc. In a production environment, this can be a double-edged sword as it means that anyone accessing Kafka Manager can perform these operational tasks. Obviously, this is not allowed. Therefore, many Kafka Manager users have a demand to turn Kafka Manager into a pure monitoring framework and disable non-essential management functions.

Fortunately, Kafka Manager provides this functionality. You can modify the application.conf file under config and delete the values in application.features. For example, if I want to disable the Preferred Leader election feature, then I can delete the corresponding KMPreferredReplicaElectionFeature item. After deleting it, we restart Kafka Manager and go back to the main interface. We can see that the Preferred Leader Election menu item is no longer there.

In conclusion, as a powerful open-source Kafka monitoring framework, Kafka Manager provides rich real-time monitoring metrics and appropriate management functions. It is very suitable for general Kafka cluster monitoring and is definitely worth a try.

Burrow #

The second Kafka open-source monitoring framework I want to introduce is Burrow. Burrow is a framework developed by LinkedIn specifically for monitoring consumer progress. In fact, when it was first released as open-source, I had high expectations for it. After all, it’s a framework open-sourced by LinkedIn, which is also the place where Kafka was created and developed. Burrow should have the potential to become a great Kafka monitoring framework.

However, it is regrettable that Burrow lacks momentum and its development has been very slow. It has not been updated for several months. Additionally, Burrow is written in Go and requires a Go runtime environment for installation. Therefore, Burrow has a lower adoption rate compared to other frameworks. Furthermore, Burrow does not have a UI interface, it only exposes some HTTP endpoints, which is a drawback for operations who want to “take the easy way out”.

If you want to install Burrow, you must first install the Go programming language environment, and then run the following commands in order to install Burrow:

$ go get github.com/linkedin/Burrow
$ cd $GOPATH/src/github.com/linkedin/Burrow
$ dep ensure
$ go install

Once everything is ready, execute the Burrow startup command:

$GOPATH/bin/Burrow --config-dir /path/containing/config

Overall, Burrow currently provides very limited functionality, with a low adoption rate and low visibility. However, the advantage of Burrow is that the main contributors to this project are the core team responsible for maintaining the Kafka cluster at LinkedIn, so the quality is guaranteed. If you happen to be very familiar with the Go language ecosystem, you may want to give Burrow a try.

JMXTrans + InfluxDB + Grafana #

In addition to the custom open-source Kafka monitoring frameworks mentioned earlier, a more popular approach nowadays is to monitor Kafka in a generic monitoring framework, such as using the combination of JMXTrans + InfluxDB + Grafana. Since Grafana supports monitoring of JMX metrics, it is easy to integrate various Kafka JMX metrics into it.

Let’s take a look at a screenshot of the monitoring dashboard in a production environment. This dashboard includes many monitoring metrics, such as CPU usage, GC collection data, and memory usage. In addition, this dashboard panel also includes many key Kafka JMX metrics, such as BytesIn, BytesOut, and messages per second. The ability to integrate all these data into one panel and present them visually is a distinctive feature of this framework.

Compared to Kafka Manager, the advantage of this monitoring framework is that you can monitor multiple key technology components of the enterprise in one monitoring framework. Especially for those enterprises that have already set up this monitoring combination, reusing this framework can greatly save operational costs and is a good choice.

Confluent Control Center #

Finally, let’s talk about Control Center released by Confluent. This is currently the most powerful Kafka monitoring framework known.

Control Center not only allows real-time monitoring of Kafka clusters, but also helps you operate and build real-time stream processing applications based on Kafka. What’s even better is that Control Center provides a unified topic management feature. You can enjoy a one-stop management service for Kafka topics and schemas here.

The following image shows the main interface of Control Center for topic management. From this image, we can visually observe the number of topics in the entire Kafka cluster, the number of in-sync replicas (ISR) for each topic, and the TPS (transactions per second) data for each topic. Of course, Control Center provides far more functionalities than just these. It almost provides all the Kafka operation, management, and monitoring functionalities you can think of.

However, if you want to use Control Center, you must use the Confluent Kafka Platform Enterprise Edition. In other words, Control Center is not free, and you need to pay to use it. If you need a powerful monitoring framework, you can log in to the Confluent company’s official website and subscribe to this truly enterprise-level Kafka monitoring framework.

Summary #

In addition to Kafka Manager, Burrow, Grafana, and Control Center that I introduced today, there are many other open-source Kafka monitoring frameworks available on the market, such as Kafka Monitor and Kafka Offset Monitor. However, most of these frameworks have stopped being updated, and some of them have not been maintained for years, so I won’t go into detail on them. If you are an open-source enthusiast, you can try contributing code to these frameworks in the open-source community and help revive them.

It is worth mentioning that there is a great Kafka Eagle framework recently popular in China. It is maintained by Chinese developers and is currently actively evolving. According to the description on the Kafka Eagle official website, it supports the latest Kafka 2.x version and provides not only regular monitoring functions but also an alerting feature (Alert), which is definitely worth a try.

In conclusion, each framework has its own characteristics and value. The Kafka Manager framework is suitable for basic Kafka monitoring, while the combination of Grafana+InfluxDB+JMXTrans is suitable for enterprises with more mature frameworks. As for the other monitoring frameworks, you can consider them as supplements to these two solutions and incorporate them into your monitoring solution.

Open Discussion #

If we want to know if there is a request backlog on a certain Broker, which JMX metric should we monitor?

Feel free to write down your thoughts and answers, let’s discuss together. If you find it helpful, you are also welcome to share this article with your friends.