05 Let's Discuss Kafka Versions

05 Let’s Discuss Kafka Versions #

Hello, I’m Hu Xi. Today I would like to discuss the topic of choosing Kafka version numbers with you. The content we are going to discuss today is really important, and I believe it is even crucial for your ability to use Kafka effectively in the future.

In the previous article, I introduced several popular Kafka distributions. Regardless of which Kafka distribution you choose, they all embed the core Apache Kafka, which is the community edition of Kafka. So today, let’s talk about the issue of Apache Kafka version numbers. Before we begin, I want to emphasize that whenever I mention the word “version,” it refers to the specific version number of Kafka, not the different types of Kafka mentioned in the previous article. Please keep this distinction in mind!

You might be wondering why you need to care about version numbers. Isn’t it enough to just use the latest version? Well, using the latest version is indeed an effective strategy, but I want to emphasize that this strategy may not be suitable in all scenarios. If you don’t understand the differences and functional changes between different versions, how can you accurately evaluate whether a certain Kafka version meets your business requirements? Therefore, it is very cost-effective to spend some time understanding the version evolution before diving into learning Kafka in-depth.

Kafka Version Naming #

Apache Kafka is currently on version 2.2, and the community is currently voting for the release date of version 2.3.0, which is expected to be released soon. However, it’s a little surprising that there is some confusion about the version naming of Kafka. For example, when we download Kafka from the official website, we see versions like this:

Some students may be wondering, isn’t the Kafka version number 2.11 or 2.12? Actually, that’s not the case - the version number in front is the Scala compiler version used to compile the Kafka source code. The server-side code of Kafka is written entirely in the Scala language, which supports both object-oriented programming and functional programming. The Scala source code, when compiled, becomes regular .class files, so we say that Scala is a language in the JVM family, and many of its design principles are commendable.

In fact, many of the new features introduced in Java are in fact moving closer to Scala, such as lambda expressions, functional interfaces, and the val keyword. Interestingly, the new version of the Kafka client code is written entirely in Java, so some people have initiated a “Java VS Scala” debate and tried to analyze from the perspective of language features why the Kafka community abandoned Scala and switched to rewriting the client code in Java. In fact, it’s not that complicated - it’s simply because a new group of Java programmers joined the community, and the old Scala programmers retired. I might have gone off on a tangent a bit, but anyway, I still recommend you to learn Scala when you have the time.

Back to the version number discussion. Now you should understand that the designation “kafka-2.11-2.1.1” actually represents the Kafka version 2.1.1. So, what does this 2.1.1 stand for? The first 2 represents the major version number; the middle 1 represents the minor or sub-version number; and the last 1 represents the patch number. After releasing version 1.0.0, the Kafka community announced in an article that the version naming rule had officially changed from 4 digits to 3 digits. For example, version 0.11.0.0 is a 4-digit version number.

To be honest, my opinion is slightly different from the community’s here. In my opinion, a version like 0.11.0.0, although it has 4 digits, actually has a major version of 0.11, not 0. So, if we look at it this way, the Kafka version number has always consisted of 3 parts, namely “major version - minor version - patch number”. This perspective can unify all Kafka version names and make our future discussions easier. Let’s review: if you come across a Kafka version like 0.10.2.2, now you know that its major version is 0.10, the minor version is 2, and two major patches have been applied, resulting in a patch number of 2.

Evolution of Kafka Versions #

Kafka has gone through a total of 7 major versions, namely 0.7, 0.8, 0.9, 0.10, 0.11, 1.0, and 2.0, with many minor and patch versions. Which versions introduced significant feature improvements? Regarding this question, I recommend that you be able to recall them accurately. This not only makes you look cool when talking to others about Kafka, but also helps you with important considerations for technical selection and architecture evaluation if you are transitioning to an architect role or already are one.

Let’s start with version 0.7, which actually doesn’t have much to say about it, as it is the “ancient” version from the early days of open source Kafka. It only provided the most basic messaging queue functionality and did not even have replication mechanisms. I can’t think of any reason why you would want to use this version, so if someone recommends it to you, just walk away.

From the transition from 0.7 to 0.8, Kafka formally introduced the replication mechanism, making it a truly complete distributed and highly reliable messaging queue solution. With replication, Kafka can achieve message loss resistance. At that time, the old version client APIs were still used for producing and consuming messages. By “old version,” I mean that when you develop producer and consumer applications using their APIs, you need to specify the ZooKeeper address instead of the broker address.

If you currently don’t understand the difference between the two, that’s okay. I will explain them in detail in future articles in this column. The old version clients had many issues, especially the producer API, which used synchronous message sending by default, resulting in limited throughput. Although it also supported asynchronous mode, it could lead to message loss in actual scenarios. Therefore, the community introduced the new version Producer API in version 0.8.2.0, which requires specifying the broker address.

As far as I know, there are still a few users in China who are using versions 0.8.1.1 and 0.8.2. My suggestion is to use the most recent version possible. If you cannot upgrade to a major version, I recommend at least upgrading to version 0.8.2.2 because the old version consumer API in this version is quite stable. Additionally, even if you upgrade to 0.8.2.2, do not use the new version Producer API as it still has many bugs at this time.

In November 2015, the community officially released version 0.9.0.0. In my opinion, this was a significant major version update. Version 0.9 added basic security authentication/authorization features, rewrote the new version consumer API using Java, and introduced the Kafka Connect component for high-performance data extraction. If you are overwhelmed by all these dazzling features, I hope you remember another benefit of this version: the new version Producer API is relatively stable in this version. If you are using 0.9 in a production environment, consider switching to the new version Producer, which is an advantage of this version that is not well known. However, similar to the introduction of the new API in 0.8.2, do not use the new version Consumer API as it has many bugs and will definitely cause crashes. Even if you report issues to the community, they won’t care and will blindly recommend upgrading to the new version, so definitely do not use the new version Consumer API in 0.9. For some startups in China that use older CDH versions, since they embed version 0.9, they should pay extra attention to these issues. Version 0.10.0.0 is a milestone release because it introduces Kafka Streams. Starting from this version, Kafka officially upgrades to a distributed stream processing platform, although Kafka Streams cannot be deployed for production use at this stage. The 0.10 major release includes two minor versions: 0.10.1 and 0.10.2, both of which mainly focus on changes to the Kafka Streams component. If you are using Kafka as a messaging engine, this version does not provide many functional enhancements. However, in my opinion, the newer Consumer API is relatively stable starting from version 0.10.2.2. If you are still using the 0.10 major release, I strongly recommend upgrading to at least version 0.10.2.2 and using the new Consumer API. Additionally, it is worth mentioning that version 0.10.2.2 fixes a bug that could cause a decrease in Producer performance. For performance reasons, you should also consider upgrading to 0.10.2.2.

In June 2017, the community released version 0.11.0.0, which introduced two major changes: the introduction of an idempotent Producer API and a Transaction API, as well as the restructuring of Kafka message formats.

The former seems to be more eye-catching, as having an idempotent Producer and transaction support are essential for Kafka to ensure the correctness of stream processing results. Without them, Kafka Streams cannot guarantee the correctness of results like batch processing. However, because these features were just introduced, the Transaction API has some bugs and is not very stable. Moreover, the Transaction API is primarily designed for Kafka Streams applications, and there are not many successful use cases of users writing their own programs using the Transaction API.

The second major improvement is the change in message formats. Although this change is transparent to users, it has far-reaching implications that will persist. Performance issues caused by message format conversions are common in production environments, so you must be cautious about this change in version 0.11. It must be said that all major functional components in this version have become very stable, with many users in China using this version. It should be considered one of the most mainstream versions currently available. This popularity is evident from the release of three patch versions specifically for version 0.11. My advice is that if you are still confused about whether version 1.0 is suitable for a production environment, you should at least upgrade to version 0.11.0.3, as the messaging engine functionality in this version is already very mature.

Finally, let me briefly mention versions 1.0 and 2.0 because, in my opinion, these two major versions mainly focus on various improvements in Kafka Streams and do not introduce many significant new features in the messaging engine aspect. Kafka Streams has indeed undergone significant changes in these two versions, and it must be acknowledged that Kafka Streams is still actively evolving. If you are a Kafka Streams user, I recommend choosing version 2.0.0 at the minimum.

Last year, a book called “Kafka Streams in Action” was published overseas, which was based on Kafka Streams version 1.0. Recently, when I tried to run the examples in the book using version 2.0, many of them could no longer compile, showing the magnitude of the changes between these two versions. However, if you still care about the messaging engine aspect, both of these major versions are suitable for production environments.

Lastly, I have a suggestion: regardless of which version you are using, please try to keep the server-side version consistent with the client-side version, otherwise, you will lose many performance optimization benefits provided by Kafka.

Summary #

I hope that now you have a clear understanding of how to choose the right version of Kafka. Each Kafka version has its suitable use cases and unique advantages and disadvantages. Remember not to blindly pursue the latest version. In fact, many engineers around me adhere to this concept: don’t be a “guinea pig” for the latest version. After understanding the differences between each version, I believe you will be able to make the most correct choice based on your actual situation.

Open Discussion #

How do you evaluate the matter of upgrading Kafka versions? What unique insights do you and your team have?

Feel free to write down your thoughts or questions, and let’s discuss together. If you find it beneficial, you are also welcome to share the article with your friends.