00 Preface Why Study Kafka

00 Preface - Why Study Kafka #

Hello, I am Hu Xi, Apache Kafka Committer, head of the User Growth Team at Tiger Brokers, and the author of the book “Apache Kafka in Action”.

Over the past 5 years, I have experienced the whole process of Kafka’s evolution from version 0.8 to the current 2.3. I have encountered many challenges and paid a lot of tuition fees along the way. Gradually, I have developed a relatively systematic and comprehensive guide to Kafka application practices. Now, I am presenting it to you in the form of the “Kafka Core Technologies and Practices” column. I hope to share my understanding of Apache Kafka and my practical experience to help you have a thorough understanding of Kafka and apply it better.

You may wonder, why should I learn Kafka? To answer this question, let’s take a broader perspective and discuss my understanding of the development of Internet technology in recent years.

In these years of vigorous development of the Internet, many dazzling new technologies have emerged. In my opinion, as of 2019, the hottest technologies in the Internet industry are the so-called ABC: AI (Artificial Intelligence), Big Data, and Cloud computing. I personally have doubts about the development prospects of blockchain technology, as I have not seen particularly good practical applications yet. Perhaps it will be more impressive in the next few years.

Among these ABCs, frankly speaking, A and C are a bit niche and not accessible to all players. On the other hand, B seems more accessible to the general public, and almost all companies can participate. I once visited a hair salon where they claimed to use a big data system to help customers design their hairstyles. This shows that Big Data is more “down-to-earth”.

As an engineer or architect, you must have been involved in the construction of many big data business systems in your work. Since these systems are designed to serve the company’s business, they usually only execute routine business logic. Therefore, they can be considered as data-intensive rather than compute-intensive applications.

For data-intensive applications, how to deal with the surge in data volume, increasing data complexity, and faster changes in data velocity is the most effective representation of the skills of a big data engineer or architect. We are delighted to find that Kafka can be very effective in helping you address these problems. Take the surge in data volume as an example, Kafka can effectively isolate upstream and downstream business processes, buffering the suddenly increased traffic from the upstream, and smoothly transmitting it to the downstream subsystems, avoiding irregular traffic impacts. Therefore, if you are a big data practitioner, mastering Kafka is a necessary skill.

The example just mentioned is only one scenario where Kafka helps with business operations. In fact, Kafka has a very broad range of application scenarios. Without being modest, Apache Kafka is considered the leader in the entire messaging engine field. Its influence alone makes it worth our while to study it. In addition, from the perspective of learning technology, Kafka is also very remarkable. We only need to learn one framework to implement message engine applications, application integration, distributed storage construction, and even the development and deployment of stream processing applications in practical business systems. This sounds very valuable, doesn’t it?

Moreover, let me show you some data. Quoting data from Dice’s 2019 Technology Salary Report in the United States, among the top 10 highest-paying technical skills, proficiency in Kafka ranks second with an average annual salary of $128,000! The first is Go programming language, with an annual salary of $132,000. Well, I hope that after seeing this, you won’t immediately close my column and head straight to the nearby Go language column. Although this data is from the U.S. talent market, we have reason to believe that the market demand for Kafka is also rising in China. At the 2019 Two Sessions, deeper research and development applications such as big data and artificial intelligence were mentioned again, and Kafka, whether as a messaging engine or a real-time stream processing platform, can play an important role in the field of big data engineering.

In conclusion, Kafka is a powerful tool and worth a try! Now that you know why you should learn Kafka, let’s take action and thoroughly learn it. But what is the path to mastering Kafka?

If you are a software development engineer, the first step in mastering Kafka is to find the corresponding Kafka client based on the programming language you are familiar with. Currently, the two most important Kafka clients are the Java client and the libkafka client. They are updated and maintained very quickly, making them very suitable for continuous investment of your time.

Once you have determined the client to use, go to the official website and learn some code examples. If you can compile and run these examples correctly, you will be able to easily handle the client. Next, you can try modifying the sample code to understand and use other APIs, and then observe the results of your modifications. If these don’t pose a challenge for you, you can write a small project to validate your learning achievements, and then focus on improving and enhancing the reliability and performance of the client. At this stage, you can thoroughly read the Kafka official documentation to ensure that you understand the parameters that may affect reliability and performance.

Finally, you can explore advanced features of Kafka, such as developing stream processing applications. The stream processing API not only allows you to produce and consume messages, but also enables you to perform advanced stream processing operations, such as time window aggregation and stream processing joins.

If you are a system administrator or operations engineer, the corresponding learning goals should be to learn how to build and manage a Kafka production environment. Your main learning goal will be how to evaluate and build a production environment based on actual business needs. In addition, monitoring the production environment is of utmost importance. Kafka provides a wealth of JMX monitoring metrics, and you can choose any framework you are familiar with for monitoring. With monitoring data, as a system administrator, you will need to observe the performance of the Kafka cluster under real business loads. Then, you need to use the existing monitoring metrics to identify system bottlenecks and improve the throughput of the entire system. This is where your work value will be most apparent.

Once you have clarified what you need to learn and how to learn it, you may feel a sense of exclamation: Wow, I have to learn so many things! Don’t worry, all the topics I mentioned will be covered in this column.

Below is a mind map I specially created for this column, which can help you quickly understand the knowledge structure of this column. The column is roughly divided into six sections, including Kafka basics, Kafka basic usage, client in-depth, Kafka principle introduction, Kafka operations and monitoring, and advanced Kafka application.

Mind Map

In the first part of this column, I will introduce the principles and applications of message broker systems in general, and discuss how Kafka, as an excellent representative of message broker systems, performs in this regard.
The second part focuses on how to use Kafka in a production environment, especially the formulation of solutions for online environments.
In the third part, I will accompany you to learn about all aspects of Kafka clients, including practical instructions for producers and in-depth analysis of consumers. Don’t miss out!
The fourth part will focus on introducing the core design principles of Kafka, including the design mechanism of the controller and the full process analysis of request handling.
The fifth part covers the content of Kafka operations and monitoring. Want to efficiently operate a Kafka cluster and effectively monitor Kafka? I will definitely help you with practical experience!
The final part will briefly introduce the practical application of Kafka Streams, a stream processing component of Kafka. I hope it will give you a different perspective on Kafka.

Here, I have to mention that readers familiar with me may know that I have published the book “Apache Kafka in Action”. You may have the question: since there is a book, what is the difference between this column and the book? “Apache Kafka in Action” was written based on Kafka version 1.0, but Kafka has evolved to version 2.3. I must admit that some of the contents in the book are outdated or even inaccurate, while the writing of this column is based on the latest version of Kafka. Moreover, as a completely new deliverable, I hope to use more relaxed and easily understandable language and format to help you obtain the latest practical experience with Kafka.

I hope that through studying this column, you will not only be able to proficiently apply Kafka to your actual work, but also develop a strong interest in learning Kafka or other technology frameworks.

Finally, I would like to conclude with a motto: Stay focused and work hard!