User Story Leaving Home Halfway Must Also Smoothly Perform Performance Optimization

User Story Leaving Home Halfway Must Also Smoothly Perform Performance Optimization #

Performance optimization has never been an easy task, and learning related knowledge is naturally not easy either. Since the column was updated, we have studied more than fifty cases and knowledge points. Along the way, there has been sweat and joy, difficulties and gains. How did you overcome them?

In this issue, we invited several active students during the period of learning the column, whether it be through leaving comments or checking in, and asked them to tell their learning stories and share their learning experiences.

Here I Come #

I am a programmer who graduated from a non-computer-related major. Because I am very interested in programming, I joined the ranks of programmers. Since graduating in 2009, I have been working in a leisure game company in Wuhan, specializing in the development of leisure game servers, and have never changed jobs.

As a backend developer, I inevitably encounter performance issues with the online program in my daily work. Like many others, my previous analysis methods were quite “primitive”.

Sometimes I would rely entirely on guessing, such as starting with the most recent changes and speculating what program might be causing the issue. Of course, after determining the program, I occasionally used a binary search method to find out which part of the code was the “culprit”.

Sometimes I would also check some Linux performance indicators, but in the past, I only knew how to use top and vmstat and look at simple performance indicators such as CPU load, memory, disk, and software interruptions. I would only “Google” for analysis methods when a specific indicator was abnormal.

Obviously, solving “new” problems like this is not timely and not accurate; and the cost of learning things by “troubleshooting” is a bit high. However, I couldn’t find a cost-effective and systematic learning method before, so I had to put it off for a long time.

Encountering Geek Time can be considered fortunate. After studying columns like “Starting from Zero: Learning Architecture,” “Core 36 Lectures on Go Language,” and “The Beauty of Data Structures and Algorithms,” I have confidence in the quality of columns.

So when the column “Practical Linux Performance Optimization” came out, I saw the subtitle “Helps You Find Bottlenecks in 10 Minutes” and immediately bought it without hesitation. Of course, the results did not disappoint me either. I can say that I gained a lot.

I still remember that during the “CPU Performance” section of this column, the company’s server switched from a certain Aliyun platform to a certain Xunyun. Unexpectedly, the just learned knowledge immediately came in handy.

Previously, the average load of the server had always been below 1 for a minute, but after switching clouds, it suddenly intermittently increased to 14; even the average load for 5 minutes increased to 8. Although the business was not significantly affected for the time being, with a responsible attitude and while learning this column, I wanted to use the knowledge I just learned to analyze the reasons.

I first collected indicators such as software and hardware interrupts, disk I/O, CPU load, and running processes and threads during high load to confirm that all indicators were stable and there were no significant changes compared to low load. Because I checked all the possible indicators that might cause the average load to increase, and also looked at the running program queue, I first ruled out myself as the cause.

Later on, based on the information collected during observation, I found that the increase in load happened at regular intervals, almost triggering once at a fixed time and lasting for a fixed duration. After this phenomenon continued for nearly half a month, without any adjustments to the deployed program, the server’s average load returned to normal, and the load did not show any abnormal increase again.

Although I didn’t find the specific reason in the end, through systematic investigation, I clearly excluded the suspicion of my own program and knew the direction to solve the problem. For me, this was a significant progress.

In fact, I have subscribed to many columns on Geek Time, some of which I don’t use much in everyday work, so I study them superficially. However, this column is closely related to my work and is explained very clearly and concisely, so I have been studying it in detail, going through each case with the teacher.

I will use the performance tools introduced by the teacher to try them out in the production environment, and see if there are any abnormal indicators to prevent problems from occurring in advance. In this way, when solving performance problems, I have changed from the previous “passive response” to the current “actively taking the initiative”. Even if there is an unexpected situation, I can quickly locate the approximate location. In addition, the teacher is very active in answering students’ questions in the messages and study groups, and I have learned a lot from them. Maybe my exposure is limited, but I think mastering the knowledge in the teacher’s columns is enough for me to use. I will continue to study repeatedly and turn this knowledge into my own abilities, and I hope everyone can gain the same amount.

hurt #

I am currently in Beijing and have been working as a Python developer for over three years. My first job was in web development, and my second job was in IoT backend development.

At this stage, I am involved in various aspects of development and do not have high performance requirements yet. However, if I want to become a better programmer, I hope to improve in this area and not just have practical knowledge. Also, since I am not from a computer science background and have only read some books on Unix and Linux, I want to improve in a more systematic and comprehensive way.

I usually use my commuting time on the subway to study the columns. I am interested in containers, so if there is related content, I will systematically operate and understand it. After all, technology still needs practical experience, although currently, I am only at a level of “ad-hoc” solutions.

During the study of the columns, I was impressed by the responsible and rigorous attitude of the teacher. Whenever I have questions, the teacher quickly provides solutions, and one time, they even provided a packaged Docker image. They were very serious and patient. Here, let me give the teacher a thumbs up.

Although I haven’t checked in for every article at the moment, I will definitely not give up and will continue to study. As other students have said, there will be gains.

As a long-time user of Geek Time who has purchased more than thirty columns, I can feel that Geek Time is really dedicated to doing something, and I have also grown through learning. The excellent communication between readers and authors is really commendable and dedicated. Let’s make progress together, really starting from geek.

Dodo Bird #

I majored in chemistry in college and joined a company that specializes in government websites in 2016 as an implementation engineer, responsible for teaching clients how to use the website backend, which requires almost zero technical knowledge. In 2017, I decided to learn Linux and enrolled in an online training course. I finally completed the course in January 2018 and switched jobs to become an operations engineer in April.

For me, server performance tuning is an essential and difficult part of the operations knowledge system. In the column, this part of the knowledge mainly consists of four modules: CPU, memory, disk, and network. Each module’s learning is divided into three steps: principles, metrics, and optimization.

Principles are the most fundamental, especially for someone like me who did not have a background in computer science or learn C language. This is where I spend the most time learning in this column. Because I have a lot of deficiencies in basic knowledge, I still rely on Baidu and Google to search for answers. Nevertheless, there are still some areas in the basic principles that I don’t understand and have to selectively skip through. After all, excessively pursuing principles would severely impact the progress of learning performance.

The second step is metrics, which are mainly obtained through tools and serve as the basis for problem localization and optimization. In fact, the important points of some metrics can be mastered by looking at the tool manuals, but the key is to relate them to the tools. This is something I found to be very clear and concise after completing the column.

In the final step, optimization, we need to identify the problems based on performance metrics, optimize them according to the principles, and then measure the optimization effects using tools. This requires a lot of experience, and the cases in the column can be good material for practice.

Currently, the performance optimization column is nearing completion, and I have gained a lot from most of its content. Since I started learning until now, I have used the methods in the column to find and solve some performance issues on my servers. If it were before, I wouldn’t have been able to locate the problems so quickly and accurately.

Here, I would like to thank the author of the column, Mr. Ni Pengfei. If it weren’t for coming across this column and wanting to master this knowledge and experience, it would have taken me much more time. For me, time is money, and this column really saved me a lot of money.

Most learning materials currently available on the internet are either not systematic, only at an introductory level, or not authoritative and accurate enough. Geek Time is truly a blessing for learners by inviting so many industry experts to host columns. I hope Geek Time will continue to improve and bring more knowledge to more learners.