20 Continuous Improvement Pdca Cycle and the Significance of Continuous Improvement

20 Continuous Improvement PDCA Cycle and the Significance of Continuous Improvement #

Hello, I’m Shi Xuefeng.

Today is the last lesson of the “Engineering Practices” series. If you were to ask me now, among all the engineering practices, what is the most important capability that a team should have when implementing DevOps? Without hesitation, I would tell you that it is continuous improvement.

Many students have asked me in the comments section, “Teacher Xuefeng, our company has already set up Gitlab and integrated it with Jenkins to achieve automated compilation, packaging, and deployment. But what else can we do next? I feel very confused.”

So, this raises another question: “To what extent should a team reach in order to be considered as having achieved DevOps?”

Whenever I encounter such a question, I think back to my experience of visiting the headquarters of a well-known company in Hangzhou, China, a few years ago.

At the time, the person in charge of coordinating with us was the main driving force behind DevOps in that company. It can be said that he witnessed the entire process of DevOps transformation in this giant company. During our discussion, I asked him a question, and his answer left a deep impression on me.

I asked him, “When do you think your company achieved DevOps transformation?” After thinking for a moment, he said, “Now, our company no longer has dedicated testers or dedicated operations personnel, and the infrastructure was containerized long ago. These things naturally happened as the business developed to a certain stage. It’s just that, after DevOps became popular, we realized that what we have been doing all along is actually DevOps. So, it is difficult to point out the exact time when the DevOps transformation was completed. For us, the most important thing is that the team has a capability, which is to continuously find new breakthroughs and strive for better states.”

I believe that this statement should represent the desired state when a team implements the DevOps transformation.

In fact, if you have the opportunity to communicate with engineers from companies like Google and Netflix, you will find that these companies, which are known for their excellent DevOps practices, do not emphasize the concept of DevOps internally. That’s because they have long been accustomed to these DevOps practices. Many well-known tool platforms were developed by their employees voluntarily to solve specific problems.

For example, popular code review and management tools like Gerrit were initially developed to address the lack of a code review tool based on Git with access controls within Google. You can learn more about this history.

You see, there is a difference between encountering a nail and making a hammer for it, and having a hammer in hand but not being able to find the nail anywhere in the world. However, many times, we tend to adopt the latter approach, holding a bunch of hammers but not being able to find the nails.

So, if I must answer the question of to what extent DevOps should be implemented in order to achieve a successful transformation, my answer would be that the core is that the team has the ability of continuous improvement, rather than just simply introducing a few tools and establishing a few metrics.

Speaking of this, you might say that this so-called continuous improvement seems to be everywhere. It appears that the final step in many engineering practices is continuous improvement. So, what is the significance of continuous improvement? Why is continuous improvement the ultimate goal of all activities?

That’s because each company faces different problems. The process from 0 to 1 is relatively simple, as we can quickly introduce tools, establish processes, and make up for capability gaps by referring to engineering practices. However, the process from 1 to N requires the team to identify improvement goals based on the business needs. Taking the original question as an example, after setting up the capability of automated building and deployment based on Gitlab and Jenkins, what other possible directions for improvement do you think are feasible? For example, are tests injected into the process? Is there a quality gate mechanism in place? Has database changes been automated? Is the build and release speed sufficient, and are there bottlenecks in build resources?

There are many directions that can be considered, but the most important and value-maximizing one at this stage ultimately depends on the business requirements, so it is difficult to generalize.

When it comes to continuous improvement, there is a very well-known methodology called PDCA, also known as the Deming Cycle. As you can tell from the name, this methodology also comes from the quality management master, Dr. Deming. PDCA is an acronym for four English words: Plan, Do, Check, and Action.

PDCA provides a structured implementation framework in which any improvement work can be divided into these four phases. Through the continuous iteration of the PDCA cycle, organizations can enter a virtuous cycle and constantly identify new problems for improvement. For these problems, root cause analysis should be conducted first, and specific implementation plans should be formulated. Then, the results of the implementation and the expected goals should be checked regularly. Finally, the improvement results should be reviewed, retaining what worked well and incorporating what didn’t into the next cycle for further improvement.

PDCA Cycle

This method doesn’t sound complicated, and everyone can understand it. The key is whether it is implemented sincerely.

Let me share a real example with you.

About two years ago, I participated in the DevOps transformation of a medium-sized enterprise. I won’t go into detail about the initial state of the enterprise when it first encountered DevOps, but let’s just say it was basically nothing. They used SVN for code repositories, local machines for build and packaging, it took two months to release a version, and often multiple versions were released in parallel, requiring dedicated personnel for code synchronization.

After more than half a year of transformation, the overall toolchain system within the team began to take shape, and the release rhythm was shortened to once a month. The team was very satisfied with the achievements.

Of course, this is not the focus. The point is, last month, I met the person in charge of this project again. She told me that they now release every two weeks and even have ad hoc version releases from time to time. I was very curious about how they managed to do it.

It turned out that when I first introduced the improvement plan, I mentioned the idea of containerization to the project team, but because the objective conditions were not in place at that time, we didn’t continue to push it. Unexpectedly, in less than a year, they had already achieved containerized deployment, and their self-built PaaS platform was also quite mature. Even compared to many large companies, they were not inferior.

She said, “This DevOps transformation process brought us not only some common engineering practices and tool platforms but also a pair of eyes that can always find imperfections and a pursuit of excellence, as well as a method of understanding these issues. They constantly drive us to find new ways to solve new problems.”

Indeed, many engineering practices and tool platforms are just a small step within the company. There will be many problems and challenges to face later on. At this time, the ultimate secret weapon we could rely on is the mindset of continuous improvement, and the core of building continuous improvement lies in building a learning organization.

So, where should we start learning? And what are some recommended practices in the learning and improvement process? I have summarized four practices that you can refer to.

Encouraging Positive Retrospection and Summarization #

Learning from failure is a principle that we understand from a young age. The attitude a team has towards handling failures largely reflects their attitude towards continuous improvement. No one wants system failures, but in the real world, they are unavoidable.

In many companies, after a failure occurs, there are several common approaches:

Gather relevant parties together, determine the severity of the issue, and assign responsibility;
Send a casual email for improvement, but without clear timeframes, and even if there are, nobody follows up;
Attribute the problem to unrepeatable accidents, eventually leading to no further action.

Compared to these approaches, a better method is to establish a mechanism for positive retrospection and summarization. This means that after a problem occurs, a detailed failure analysis report should be prepared in advance, and relevant parties should thoroughly analyze the root cause of the problem and provide specific timeframes for improvement tasks.

The purpose of failure retrospection is not necessarily to assign blame, but more importantly, to identify potential issues and vulnerabilities in the system flow and ensure them through subsequent mechanisms, such as adding test cases or implementing product reviews.

In fact, whether it is an online failure or a daily error, they are all worth retrospecting and summarizing.

For example, we encounter various compilation errors every day. It is obviously inefficient if everyone has to solve the same problem individually.

This requires a team responsible for collecting and summarizing these common errors, extracting key error information and common solutions to form a knowledge base. At the same time, an automated service should be integrated into the system, so that the next time someone encounters a compilation error, it can automatically match the knowledge base and send them a problem analysis report and suggested solutions to help team members quickly resolve the issue.

In this way, as the collective wisdom of the team continues to accumulate, more and more problems will be identified, enabling knowledge sharing and development assistance within the organization. This is a key area of focus in many large companies. If you think about it, this process itself is a PDCA cycle.

However, it is worth noting that the process of implementing continuous improvement should not be a one-time comprehensive change, but a series of small and frequent improvement actions. Large changes often affect many aspects and are prone to failure, while small improvements are more gentle and easier to succeed. To help you understand, I will share a schematic diagram with you.

Allocating Dedicated Time for Improvement #

Many times, teams find themselves in a busy state where time seems to be the greatest enemy to implementing DevOps. As a result, the team falls into a state where they are too busy to have time for improvements.

If the team chooses to prioritize developing more features within the same amount of time, it indicates that, at least at this stage, the importance of business development outweighs the importance of DevOps implementation.

However, the problem is that business requirements have no end. Sometimes, when I ask frontline employees, “Is there any area where DevOps can help you?” either they say, “Nothing special, it’s fine as it is,” or they bring up trivial points. In fact, this can only mean that they either haven’t thought about this issue or they don’t know better alternatives. But if we can’t motivate frontline employees, we won’t be able to initiate continuous improvement.

Therefore, the correct approach is to allocate a portion of time for improvement in the team’s daily iterations or during relatively less busy periods in the business (e.g., right after a major promotion, when the team is readjusting). More time should be spent on improvement tasks during these periods.

This work is primarily used to address non-functional requirements, technical improvements, such as fixing technical debt, supplementing unit test cases, and addressing identified improvement measures. By setting aside this dedicated time for improvement, the team can cultivate a culture of continuous improvement.

I highly recommend adding a new category of tasks to the team’s backlog specifically for recording and tracking these continuous improvement activities. During iteration planning meetings, analyze these issues and estimate the workload to ensure that the team has dedicated time to address them.

In addition, many companies have also started to hold Hackathon Days, which are events where participants use programming to implement their own ideas and creativity within a limited time. This process is filled with a spirit of active exploration, free-thinking, and the concept of challenging limits. It encourages teamwork and mutual inspiration and embodies the entire process from ideation to development.

Our team is currently preparing to participate in this year’s Hackathon Day, hoping to seek cooperation and collaboration through this platform. In addition to solving the “tough” problem of internal efficiency improvement, we also aim to boost the morale of team members and showcase the value of DevOps on a larger stage, achieving two goals at once.

Many times, team members act like temporary workers and have no knowledge of the performance of the requirements and business they are responsible for. If team members do not have a sense of ownership for a certain matter, how can we stimulate their sense of responsibility and self-drive?

Therefore, it is essential to make business metrics and performance as transparent as possible within the team, allowing team members to access real-world user feedback and evaluations, as well as measurements of business progress.

After a new feature is developed and launched, it should be possible to view the live status of the requirement. If the analysis of the requirement already included associated performance metrics, the data regarding those metrics can also be displayed. In this way, the development team will know how many issues their deliverables have and how users are actually giving feedback. This will push the team to think more from the users’ perspective.

Besides business metrics, the metrics framework for DevOps should also be publicly transparent internally. Everyone can assess the performance of their own team as well as the overall level within the company.

Appropriate lateral pressure will motivate everyone to proactively accept improvement work and demonstrate the effectiveness of improvements through measured data, thus forming a positive cycle.

Inspiring Creativity and Maximizing Value #

Every team has members with innovative ideas and thoughts, who can always find areas for optimization within established norms.

For example, one of our testers in the team previously found that daily tracking testing was time-consuming and labor-intensive, and lacked data statistics. So, she took the initiative to develop a small tool in her spare time to handle this part of the work, which significantly improved efficiency.

If more people are aware of such innovations and they are used on a larger scale, it can not only improve the efficiency of more people and benefit the entire team, but also reduce repetitive construction and encourage employees with ideas to participate in tool optimization.

A good approach is to include requirements for team contributions and technical innovations in the performance goals of team members and encourage innovative work within the team. In addition, establish corresponding selection and incentive mechanisms within the team to invest resources in good ideas and turn them into tools that can solve similar problems.

Many companies have also started to pay attention to the importance of internal knowledge reuse. Therefore, whether it is open sourcing code repositories, constructing public foundational components, or developing a company-level platform governance system, all of these can help you quickly reuse existing capabilities and avoid continually reinventing the wheel.

Conclusion #

Just like the goal of every engineering practice is continuous improvement, the “Engineering Practices” series in our column also concludes with the practice of continuous improvement.

I have always believed that whether a team has established a culture of continuous improvement is an important reference for evaluating the effectiveness of the team’s DevOps practices. In this lecture, I introduced you to the PDCA continuous improvement methodology, which involves planning, implementing, checking, and taking action in a continuous iterative process to constantly push the team towards a better state and promote positive development.

In addition, I introduced you to four methods of implementing continuous improvement, including summarizing and learning from failures, establishing dedicated improvement time, sharing metrics within the team and cultivating a sense of responsibility, and inspiring the team’s creativity to maximize value. The core of these methods is to create a learning organization and culture that provides rich nutrients for the rooting and germination of DevOps.

Starting from the next lecture, we will enter the “Tooling Practices” series, where I will introduce you to the design principles and implementation paths of some core tools, as well as the usage of some common open source tools. Stay tuned.

Reflection Question #

Besides the four methods of continuous improvement that I mentioned, are there any activities in your company that can promote the development of a culture of continuous improvement?

Feel free to write your thoughts and answers in the comments section. Let’s discuss and learn together. If you find this article helpful, you are also welcome to share it with your friends.