00 Preface Breaking Through the Four Misconceptions Stage to Advanced Performance Engineering

00 Preface Breaking through the Four Misconceptions Stage to Advanced Performance Engineering #

Hi, I’m Gao Lou.

For more than a decade, I have been working on performance testing, analysis, and optimization. In the early years, I shared my work experience on major testing forums and formed a complete knowledge chain about performance testing. Later on, I started leading my own team and completed more than 40 projects. The team has grown from four or five people to over 300 people today. Those who have worked with me know that my principle for performance projects is to make sure they don’t crash after going live, and if they crash, I won’t charge.

In 2019, I launched my first course, “Performance Testing in Action” on Geek Time 《性能测试实战30讲》. In this course, I introduced what I consider important in the testing process, such as overall concept organization and performance analysis approaches. Through this course, I hope to convey a value proposition – making performance valuable – in order to refresh many people’s understanding of performance testing and show that this field can accomplish many things. This is something I have been doing for a long time.

Four Cognitive Limitations of Performance Engineers #

You might wonder why I am writing the second course.

Because I want to demonstrate the thinking process and the actual implementation process of performance projects through a practical project at the engineering level. From the perspective of a performance engineer, even if you have mastered the content of the first course, you will still encounter other challenges in various stages of the project. For example, as shown in the following picture:

As an excellent performance engineer, the importance of performance planning, requirement analysis, bottleneck analysis, and other issues mentioned in the above picture is self-evident. However, they have not been truly valued, and there are rare corresponding solutions in the market.

Therefore, I hope to take you through the entire operation process, from performance requirements to the final performance report, through a practical project, so that you can thoroughly understand these pain points and overcome them one by one.

Looking at the current performance market, I often feel sad, as many people often have four major misconceptions about performance.

1. Overemphasis on certain tools in performance.

In many consulting or training sessions, I often see performance engineers who have many years of experience but only know a few performance tools such as JMeter and LoadRunner. It is the same in the monitoring field, with engineers thinking that knowing some operating system or language monitoring and analysis tools can make them excellent.

2. Staying only on the surface.

I have seen many performance projects, training sessions, and speeches, and some people, who are self-confident due to their background experience, often boast in some occasions. However, when you ask them how to implement and the specific implementation process, these people only use the tricks learned in training to dodge the question.

3. Limited to the performance team and unable to step out.

For me, performance has always been an engineering-level work. But many performance engineers often cannot step out of their own teams. For example, when a system has performance bottlenecks, can we step out of our own team and point fingers at development and operation colleagues with reasonable and justified grounds? (Of course, I do not recommend this impolite behavior; it is just an exaggerated expression technique.)

One sentence that I often say to my team is that when there is a bottleneck, go out and fight with the development team, but make sure to come back victorious! Otherwise, don’t go!

Because there is often a situation where, when there is a lack of evidence to prove the root cause of the bottleneck, performance engineers are tossed around like a ball, with statements like “this may be caused by such and such problem, go ask someone” or “this could be the reason, try it again” etc. Faced with such situations, do you think performance work still makes sense?

4. Unable to reflect on business scenes.

In fact, what bosses want is a clear answer to a question like this: When there are 10 million people online, will this system die? As a performance engineer or the leader of a performance team, would you dare to confidently say, “If it dies, I take responsibility, and I will pack up and leave”?

If you dare to say such things, then you will definitely receive a different salary, just like buying insurance. But in the performance market, who dares to provide such business assurance?

Based on the current market situation, I hope to demonstrate the true value of performance analysis through this course, change some of your existing misconceptions, and help you become an excellent performance engineer. This requires us to elevate performance from a “testing” to an “engineering” level, because only in this way can the true value of a performance project be demonstrated.

How will I teach this course to you? #

In order to help you better understand the content I’m going to teach, I have specially built a complete system, and all the content of the course will be based on this system.

In this project, I have used technologies such as Kubernetes+Docker+nginx_ingress/Java 1.8/Spring Cloud microservices (built-in Tomcat)/Grafana+Prometheus+Exporters+SkyWalking/Redis/MySQL/RabbitMQ to build the entire service. Building such a service requires a considerable amount of time, starting from hardware installation and operating system setup. It took me nearly a month in total, using 62C140G of hardware resources.

During the setup process, initially I considered using OpenStack for infrastructure, hoping to cover all the mainstream technologies in the current tech stack. However, in the end, I decided against it because using OpenStack for such a large-scale resource would be a waste. In addition, I encountered many miscellaneous issues during the setup process, such as multi-NIC queues and hardware oversubscription, which brought about overall architectural problems. Fortunately, I managed to solve all these problems one by one.

The performance issues I encountered in this system, as well as my analysis process, will be presented in this course. I will explain in detail how the entire performance project is done from a complete performance project perspective. I will also guide you on how to locate a bottleneck from the perspective of a complete performance analysis decision tree and a performance bottleneck evidence chain. The analysis data and performance results of this project will be presented to you in a realistic manner, allowing you to see that the analysis methods and approaches I discuss can all be implemented successfully.

I recommend that you practice hands-on while learning this course, so that you can have a deep understanding and experience of the analysis ideas and methods discussed in the course. If you want to build such an environment yourself, you don’t need as many hardware resources, and you can also build the technology components in batches, as not all scenarios require the entire environment. For some more complex and error-prone parts, I will also provide you with corresponding guidance documents to help you successfully complete the setup.

What should I do if the project I am working on is different from the projects in the course? #

You might wonder what to do if the performance project you are working on is different from the ones covered in the course. But there’s no need to worry because I will describe a general performance project for you, focusing on the logical understanding. So in this course, we will start with the following three key points, which will also be the main focus of your future learning.

Firstly, we will focus more on analysis.

When it comes to performance, there are very few teams that don’t write scripts. In this course, instead of focusing on scripting, I will emphasize analysis. If you are a beginner, I hope you can learn the basics of scripting and other tools on your own.

So how do we understand “analysis”? Simply put, it is a complete analysis process starting from scripts and leading to specific performance bottlenecks. This is also the core of performance technology that I have always emphasized.

Secondly, we will pay more attention to the completeness of analysis links.

In this course, I won’t go into detail about every technical point, such as what the “us” in CPU information after running the “top” command in the Linux operating system means. You can find these things on search engines, so you don’t need to spend money on this course to learn them. What I will talk about in this course is what to do when the “us” (user CPU time) is high.

Some people might think this question is simple, as the normal routine is to check the process, the thread, the stack, and the code. Those who can explain it in this way prove that they are experienced performance engineers, but not necessarily highly experienced ones. Because from a procedural perspective, that is how it works. However, when encountering a specific problem, knowing the routine doesn’t mean you know what actions to take.

Taking printing stack for example, how many times should we print the stack for different problems and phenomena? For example, when “cswch/s” is high, how should we print the stack? When “nvcswch/s” is high, what should we do to print the stack? I will explain this kind of thinking in this course.

Finally, we emphasize project-level completeness.

This course will cover the key performance actions that should be included in a project. For example:

  • Which key points in the script will affect the final results, and how exactly do they affect them?
  • Can the business model fully match the data obtained from the production environment?
  • How should performance reports reach conclusions?
  • What specific recommendations should the performance team give to operations and maintenance?
  • How to evaluate the performance project after deployment?

In summary, I hope you can see the real process of a performance project, know what to do at each stage of a performance project, and understand what it should look like from a more macroscopic and global perspective. This is the essential ability that an excellent performance engineer must possess.

Lastly, if becoming an excellent performance engineer is your aspiration and you are not satisfied with mediocrity, and if you hope to go further in the field of performance, then join me on this journey. I will share my experience of more than ten years in the industry with you without any reservation.

If possible, you can discuss your learning difficulties in performance testing in the comments section, so that I can provide you with more targeted explanations in future courses. Also, feel free to share this lecture with like-minded friends around you to support and learn from each other, making it easier to reach the finish line.

About the course reader group

Click the link on the course details page, scan the QR code, and you can join our course reader group. I hope the exchange and collision of ideas here can help you make greater progress. We look forward to your arrival~