12 Continuous Integration, Is Your Ci and My Ci the Same Thing

12 Continuous Integration, Is Your CI and My CI the Same Thing #

Hello, I am Shi Xuefeng. Today I want to talk to you about Continuous Integration (CI).

Previously, I was invited to participate in a DevOps communication event organized by a company. The head of their quality team shared their experience in DevOps platform construction, with a large portion of the time dedicated to discussing CI. At first, everything seemed fine, but as I continued listening, I started feeling more and more confused. In fact, I was so puzzled that during the Q&A session, I had the urge to ask a question: “What does CI mean to you?” However, in order to avoid being looked down upon by the host, I held myself back.

On the way back, I kept thinking about this question. Many times, people talk about CI, but it seems that their definition of CI is different from mine. For example, sometimes CI is used to refer to the team responsible for building internal tool platforms; sometimes CI is treated as a technical practice, synonymous with software compilation and packaging; and sometimes CI is understood as a role or function, referring to the person responsible for integration and release. It is obvious that, just like DevOps, everyone has different interpretations of CI.

However, the problem is, if we cannot understand the original meaning of CI, how can we truly realize its value? How can we build a platform in the name of CI without deviating from its purpose and solving real problems?

Therefore, today, let us reacquaint ourselves with this “familiar stranger.”

CI stands for Continuous Integration, as we all know. There are two key questions in its name: integration of what? and why continuous? To answer these two questions, we need to understand the history behind the birth of CI.

In the 1990s, software development was still dominated by the waterfall model. People realized that for a long period of time, the software was simply unable to run. This is because according to the project plan, the software’s functionality was divided into different modules and developed by different teams. Only during the integration phase after development was completed, would the software be truly assembled together. However, often after several months of development, when it came time to integrate, a large number of conflicts and functional issues resulting from branch merging would occur, causing teams to be overwhelmed, firefighting, and sometimes even discovering that the integration was simply impossible.

When I first started working, I was involved in projects similar to this. We were responsible for developing client programs, and during integration, we discovered that the client’s database was using Oracle, while we, to save trouble, were using Microsoft Office Suite’s Access. I suspect many young engineers who have just started working may have never heard of this database, but it caused us to be unable to import the data provided by the client into the local database. As a result, during the entire New Year’s holiday, we worked overtime tirelessly, finally rushing to complete a data middleware layer and successfully integrating the two ends.

Therefore, software integration is a high-risk and uncertain task. There is even a term abroad called “integration hell.” Because of this, people tend to avoid doing integrations, which makes the integration phase at the end of development even more difficult, creating a vicious cycle.

To solve this problem, the concept of CI emerged. CI itself originated from Kent Beck’s introduction of Extreme Programming (XP) in 1996. As the name suggests, Extreme Programming is a software development method and one of the methods of Agile Development. Its goal is to improve software quality and enhance the speed of responding to user requirements by shortening the development cycle and increasing release frequency.

I don’t know why, but every time I hear about Extreme Programming, I become inspired. In any era, there is always a group of programmers who are at the forefront and represent and inherit the spirit of geeks. Just like our platform’s name, “Geek Time,” it represents the spirit of not being satisfied with mediocrity and pursuing the ultimate, which is great.

Anyway, let’s get back on track. One of the practices proposed by Extreme Programming is still very relevant today. For example, pair programming, software refactoring, test-driven development, coding conventions, etc., these terms are familiar to us, but actually achieving them is quite rare. Among them, there is a very interesting practice called “40-hour workweek,” which means working 5 days a week, 8 hours a day. Considering the heated debates on the internet about the “996” work culture not long ago, we can see that the development of Extreme Programming in China still has a long way to go. Of course, in so many practices, continuous integration can be said to be the first one that has been widely accepted and recognized.

Regarding the definition of CI, I would like to quote the content from a blog post by Martin Fowler, which is currently one of the most recognized definitions in the industry:

CI is a software development practice in which team members frequently integrate their work together (usually at least once per person per day, resulting in multiple integrations per day), and after each integration, an automated build task with a set of automated verification tests is triggered to detect integration problems as early as possible.

CI adopts an unconventional approach to address the dilemma of software integration, and its core idea is: the more painful something is, the more frequently it should be done. Many people don’t understand why, but you will understand with an example. When I was a child, my health was very poor, and I often had to drink Chinese herbal medicine. The first time I drank it, I wanted to vomit after every sip, but after continuously drinking it for a week, I found that there was no difference in taste between the herbal medicine and water. This is actually because people have strong adaptability and gradually get used to the taste of the herbal medicine. The same goes for software development.

If there is such a high risk and uncertainty in integrating all at once at the end of the development cycle, it is better to increase the frequency of integration and reduce the content of each integration. Even if it fails, it will only affect a small integration content, and problem identification and fixing can be done more quickly. In this way, not only the quality of the software is improved, but also the waste caused by rework in the final stage is greatly reduced, and the efficiency of software delivery is enhanced.

You may say that I understand this principle, our continuous integration is like this. Don’t worry, let’s test it together.

If you think that your project and team are practicing CI, you can consider three questions to see if you have achieved it.

Does every code submission trigger a complete pipeline?

Does every pipeline trigger automated testing?

If there is a problem with the pipeline, can it be fixed within 10 minutes?

I have done this test many times on-site. If the participants think they have achieved it, they will raise their hands; if they haven’t, they will put their hands down. The result of the three questions always makes people secretly delighted when facing a group of confident CI “believers” because almost everyone raises their hands at the beginning, convinced that they are practicing continuous integration. However, as I ask each question, half of the people will put their hands down, and only a few people will persist until the end. These few people, in the face of the gazes of others, also begin to doubt themselves, and if I ask a few more questions at the right time, basically all of them will put their hands down.

From this perspective, CI sounds simple and easy to understand, but it is not so easy to implement. It can be said that CI covers three stages, and each stage contains a set of ideas and practices. Only when all of these are done can CI be truly implemented. Next, let’s look at these three stages one by one.

Phase 1: Triggering the Complete Pipeline with Every Commit #

The keyword for the first phase is fast integration. This is the best interpretation of the core concept of CI, which means achieving the ultimate speed of integration, with CI triggered for every change.

Of course, these changes could be code changes, configuration changes, environment changes, or data changes. As I mentioned before, everything should be included in version control. This way, all metadata changes will be captured by the version control system and notified to the CI platform through events or webhooks.

For modern CI platforms like Jenkins, which many of us use, multiple triggering methods are supported by default, such as scheduled triggering, polling triggering, or webhook triggering. So, to trigger continuous integration with every commit, the version control system and the CI system need to be connected. For example, integrating GitLab and Jenkins. There are many ready-made materials available online, and following them should generally not cause much trouble. However, is it enough to just connect these two systems? Obviously, it is not that simple. Implementing a commit-triggered pipeline also requires some prerequisites.

Unified branch strategy.

Since the purpose of CI is integration, we need to start with a branch whose purpose is integration. This branch can be the main development branch or a dedicated integration branch. Any changes made to this branch will trigger the corresponding CI process. Now, some may ask, many times development is carried out on feature branches or version branches, so does this mean that the commits on these branches do not go through the CI process? This leads to the second prerequisite.

Clear integration rules.

For a large to medium-sized team, the number of commits made each day is quite astonishing. This requires CI to have sufficient throughput to handle these requests in a timely manner. The CI steps and requirements are also different for different branches. Different branches have different integration goals, and naturally, their corresponding steps will vary.

For example, for development feature branches, the main goal is rapid verification and feedback. So, speed is an important factor that cannot be ignored. Therefore, continuous integration at this level mainly focuses on verifying packaging and code quality. As for system integration branches, their purpose is not only to verify packaging and code quality, but also to check the correctness of interfaces and business aspects. Therefore, the integration steps will be more complex and the cost will also increase. So, choosing appropriate integration rules based on branch strategy is crucial for the effective operation of CI.

Standardized resource pool.

As the infrastructure of CI, the importance of the resource pool is self-evident.

First, the resource pool needs to implement environment standardization, which means that any task should be able to run on any node, ensuring that the necessary tools, configurations, and other elements are all present. If a CI task can run on one node but fails on another, the credibility of CI will be affected.

In addition, the concurrent throughput of the resource pool should be able to meet the requirements of centralized submissions. A resource pool that can be dynamically initialized on demand becomes the best choice. Of course, cost factors should also be considered, because if a large number of resources are invested but not effectively utilized, it will result in tremendous waste.

Sufficiently fast feedback cycle.

The more basic the CI, the stronger the sensitivity to speed. Generally speaking, if the CI process does not provide feedback within 10 to 15 minutes, developers will lose patience. Therefore, the speed of CI is an important metric that needs to be monitored. For different systems, an acceptable maximum duration for CI should be agreed upon. If it exceeds this duration, it will also cause CI failures. Therefore, environment, platform, and development teams need to work together to maintain this.

As you can see, a set of basic and usable CI relies on more than just these conditions. The core is to complete the integration action and provide feedback in the shortest possible time. If your company has achieved CI for code submissions and there are no significant failures or queuing situations, then congratulations, the first phase has passed.

Phase 2: Automated Testing Triggered by the Pipeline #

The keyword for the second phase is “built-in quality”. I will explain more about built-in quality in the following sections. In fact, the purpose of continuous integration (CI) is to detect problems as early as possible, including build failures and quality issues such as failed tests or violations of code standards through static code analysis.

Many CIs I have seen lack the capability or have poor capability when it comes to automated testing. They are essentially ineffective in discovering significant issues. There are several important considerations to take into account. Let’s take a look at them.

Matching suitable testing activities.

For different levels of CI, it is still necessary to determine the quality activities based on integration rules. For instance, basic commit integration does not require complex or time-consuming tests. Fast code checks and smoke tests are sufficient to demonstrate that this version meets the basic requirements. For system-level integration, higher quality standards are required. Thus, some interface tests and UI tests can be included in the CI process.

Establishing credibility of test results.

The goal of automated testing is to help developers identify issues early on. However, if CI fails frequently due to deficiencies in automated testing capabilities or unstable environments, then the CI becomes irrelevant for developers. Therefore, we need to categorize and grade CI failures, focusing on exceptional cases and false positives, and continuously optimize and improve accordingly.

Improving efficiency of testing activities.

Considering the sensitivity of CI to speed, it becomes crucial to run the most effective testing tasks in the shortest possible time. Clearly, having a large and comprehensive test suite is not ideal. Only by combining basic functionality verification and test tasks related to the changes made in this CI, can the probability of identifying issues be greatly increased. Therefore, automatically identifying and matching corresponding test tasks based on CI changes is also a challenge.

Congratulations if your CI has integrated automated validation sets and these sets can effectively detect issues. However, this is not a one-time deal. After all, automation testing needs to be continuously updated to ensure its effectiveness as business requirements evolve.

Phase Three: Fixing Problems in a Timely Manner #

So far, we have achieved rapid integration and built-in quality. To be honest, it is not difficult to quickly build a CI platform using existing open source tools and frameworks. However, the key to truly harnessing the value of CI lies in the team’s attitude towards continuous integration and whether a culture of continuous integration has been established within the team.

Many companies in Silicon Valley have an unspoken rule that employees must confirm that continuous integration is functioning properly before leaving work each day. In addition, these companies do not recommend deploying code late at night or on weekends because if a problem occurs, it is difficult to fix it in a timely manner and the impact can be difficult to estimate.

In reality, many companies do not know the average time it takes to repair their CI systems, despite investing a significant amount of human and material resources into building them. As it stands now, sometimes they can fix the problem within 10 minutes, while other times it may take several hours. The reason could be that the responsible person is in a meeting or it happens to be during lunch break.

Of course, some companies question the 10-minute repair time, as the nature of software projects may often result in integration cycles that are much longer than 10 minutes. If you also think this way, then you may have misunderstood the concept and purpose of CI. After all, I don’t believe even Martin Fowler can guarantee fixing problems within 10 minutes. In such a short time, human factors are actually beyond control, so the key lies in establishing mechanisms, not relying on individuals.

What are mechanisms? Mechanisms are a kind of agreement, where people are willing to follow certain behaviors and will benefit from doing so. For CI, ensuring the availability of the main integration line is actually a kind of agreement among team members. It is not about who caused the problem and who should fix it, but rather about whether we can ensure the stability of CI, have a clear degradation path for problems, and actively monitor, analyze, and promote problem resolution.

In addition, the team needs to establish clear rules. For example, if a problem is not fixed within 10 minutes, the code should be automatically rolled back. When the CI “lights up red,” the team should no longer submit new code because it is not possible to verify new submissions on top of an erroneous base. At this point, the team needs to collectively put aside their work and work together to restore the state of CI.

Only when team members firmly believe that the long-term benefits of CI outweigh the short-term investment and are willing to practice CI themselves, can this “10-minute” rule be guaranteed and put into action.

Summary #

In this lecture, we reviewed the history of the birth of CI and the fundamental issues that CI attempts to solve. At the same time, we also introduced the three stages of CI implementation and their core principles, namely rapid integration, quality built-in, and cultural establishment.

Finally, I would like to emphasize that many people often confuse tools and practices. Once the results do not meet expectations, they will question whether the practices are reliable and whether the tools are user-friendly, easily falling into the trap of tool determinism. In fact, the core principles of CI have never changed, but the tools have been constantly upgraded. Tools are the carriers of practices, and practices are the foundation of tools. The construction of tools alone is just a small step in a long journey, and this is something we must understand.

Thought-provoking questions #

It can be said that a good CI reflects the capabilities of the entire development team in all aspects. So, what problems and insights do you have regarding the practice of CI within a company?

Please leave your thoughts and answers in the comments section. Let’s discuss and learn together. If you find this article helpful, feel free to share it with your friends.