21 Open Source or Self Developed, Three Phases of Corporate Dev Ops Platform Construction

21 Open Source or Self-Developed, Three Phases of Corporate DevOps Platform Construction #

Hello, I’m Shi Xuefeng. Starting today, this column officially enters the “Platform Tool Series”.

In this new section, I want to focus on three aspects:

Help you clarify the implementation path of the DevOps platform within your enterprise and understand the main context of platform construction.
Share with you some core platform construction experiences, which come from the production frontline.
Analyze the development direction and hot trends of DevOps platforms, so that you can keep up with the trend when conducting platform construction.

What I want to tell you is, no one is born a DevOps platform product manager, but everyone can become one.

Because the products of a DevOps platform are different from business-oriented products, their goal is to solve practical problems faced by frontline development and delivery teams.

Ordinary product managers may lack the background in development and delivery, making it difficult for them to understand the difficulties faced by those teams. On the other hand, development and delivery teams lack the skills and perspectives of product managers. Therefore, talent in this field is extremely rare and can only be cultivated internally. I hope that through this column, you can explore some key principles of product design.

Alright, today let’s discuss the topic of enterprise DevOps platform construction.

As I mentioned before, when implementing DevOps within an enterprise, tools are not omnipotent, but without tools, it is simply not possible.

When companies decide to introduce DevOps tools, there are typically three options: using open source tools directly, purchasing commercial tools, or developing their own tools.

You may argue that if a company has the capability, it should choose to develop its own tools. This way they have autonomy and core competitiveness. However, in the DevOps State of the Industry Report, some different findings can be observed.

Companies that lean towards using fully self-developed tools often have lower efficiency levels. In this context, fully self-developed tools refer to those that do not rely on open-source solutions and are completely implemented by the company itself. On the other hand, companies that mostly adopt open source tools tend to have better efficiency levels.

This seems counterintuitive. Companies spend so much time and effort to build internal tools, but in the end, they fail to achieve the expected results. Why is that?

In my opinion, this is because they have not found the correct path for internal platform construction. We need to do the right things at the right time. Being too advanced or too behind will both lead to problems.

Therefore, next, I will discuss with you the three stages of enterprise DevOps platform construction.

Phase One: From Nothing to Something #

In this phase, the construction of the DevOps platform is just starting, and there are still a large number of local operations and repetitive operations in the entire delivery process.

In addition, most companies do not have a well-established tool team dedicated to platform capacity building.

Therefore, for this phase, my advice to you is to introduce open source tools and commercial tools to quickly fill in the existing capability gaps.

The so-called capability gaps are actually the missing parts in the current delivery toolchain system, especially for high-frequency operations or those involving collaboration among multiple people, such as requirements management and continuous integration.

Whether it is an open source tool or a commercial tool, they are generally mature and ready-to-use. This “ready-to-go” capability is what companies need the most at the moment. The introduction of tools solves the problem of starting from scratch and directly improves individual efficiency. This is also the main reason why team efficiency can skyrocket in the early stages of DevOps transformation.

When you see this, you may have two questions: “How to choose the tools?” and “Why are commercial tools also an option?”

In fact, these are the two typical problems that teams are most concerned about when introducing tools in the early stages. Let’s discuss them one by one.

How to choose the tools? #

Nowadays, there are too many tools with the name “DevOps”. How do you choose a suitable one from such a wide range of tools?

Some people may pull out the functional list of related tools and compare them one by one to see which tool has more powerful functions. However, I think, in the phase of starting from scratch, we don’t need to be so complicated. The core principle is to choose mainstream tools.

Mainstream tools are the ones widely used in the industry, frequently mentioned in various sharing articles, and with a wealth of usage experience. Here are some tools I recommend for you to reference:

Requirement management tool: Jira
Knowledge management tool: Confluence
Version control system: GitLab
Continuous integration tool: Jenkins
Code quality tool: SonarQube
Build tool: Maven/Gradle
Artifact management tool: Artifactory/Harbor
Configuration management tool: Ansible
Configuration center: Apollo
Testing tools: RF/Selenium/Appium/JMeter/TestNG
Security and compliance tools: BlackDuck/Fortify
…

In the initial stage, the tools mainly need to solve specific problems. Mainstream tools usually have better scalability, such as a complete list of interfaces and even built-in plug-in support for other tools.

In addition, many development practices are designed based on mainstream tools. The industry has deep exploration of these tools, and there are many ready-made best practices that correspond to the goal of quickly filling in capability gaps.

I have seen a large financial institution before that was considering switching their code management from SVN to Git. However, the Git platform they chose was neither the open source GitLab nor Gerrit, nor was it a mainstream commercial tool. It was an open source tool that they had never heard of before.

The workflow of this tool was different from other tools, and the supporting review and build functionalities were not well-implemented. In the end, this institution switched to a mainstream tool.

Why are commercial tools also an option? #

With the maturity and improvement of open source tools, more and more companies, including traditional enterprises, are actively embracing open source, as if open source represents the future trend.

Then, does it mean that we only need to choose open source tools without considering commercial tools? I think this kind of thinking is somewhat one-sided.

The advantages of commercial tools have always existed, such as professionalism, security, scalability, and technical support. In fact, many open source tools also have commercial versions.

For example, many companies use Artifactory as the standard artifact management tool, even though they have open source Nexus. This is because Artifactory has obvious advantages in supporting different types of artifacts, distributed deployment, additional artifact security vulnerability checks, and integration with external tools.

In addition, if the requirement and defect management tool Jira is deeply integrated with Confluence, it can meet the needs of most companies.

Here’s another example: Gradle, the most common build tool for Android development, has a commercial version that can significantly improve the compilation speed. At the beginning, you may think that it’s sufficient, but when you begin to pursue extreme efficiency, these become the core competitiveness.

There are many reasons to choose commercial tools, and the main reason not to choose them is usually the cost. Regarding this issue, what I want to say is to distinguish whether an expenditure is a cost or an investment.

Just like buying gold, although it also costs money, it is an investment that can preserve and increase value in the future, and even be liquidated. The same goes for commercial tools. If a commercial tool can greatly improve team efficiency, the final output may far exceed the initial investment. If we build a team and develop a set of self-developed tools based on the commercial tools, the cost of reinventing the wheel could be considerable. So, the key is to evaluate this expense properly.

Phase 2: Growing #

After the first phase, most of the tools in the enterprise delivery pipeline are already in place. The team’s demand for tools is shifting from “sufficient” to “excellent”. Additionally, with business development and team expansion, the issue of differentiated needs has also become prominent. Furthermore, with more people and more data, the importance of tools is increasing day by day.

Therefore, issues such as tool stability, reliability, and performance when used on a large scale are starting to emerge.

For this phase, my advice to you is to use semi self-built tools and customized commercial tools to solve your own problems.

Semi self-built tools, in most cases, are still based on the secondary development of open source tools or the encapsulation of open source tools to implement the required business logic and interaction interface on top of the open source tools.

For example, a self-built build and packaging platform based on Jenkins can be implemented using Jenkins API and plugin extensions. I have attached an architectural diagram for your reference.

So, what are the considerations for semi self-built tools? Although the functional abilities of tools in different fields can vary greatly, based on my experience, there are mainly two points to consider: creating space for expansion during the design phase and focusing on metadata governance during implementation.

Creating Space for Expansion during the Design Phase #

When initially building a platform, it is easy to focus on the immediate problems and provide the necessary features. While this is a pragmatic approach, a platform should still have a top-level design to allow for future expansion. This may sound abstract, so let me give you a few practical examples that highlight some pitfalls we have encountered before.

Case 1:

During the initial design of the platform, tenant characteristics were not considered, and it was only intended to fulfill the needs of a single business. When we wanted to make the platform available externally after the features matured, we realized that we needed to insert tenants at a higher level. This led to a major overhaul of the system, requiring adjustments not only to the functionality pages but also to the permission model.

If we had considered future expansion requirements during the initial platform design and implemented a single business as a tenant on the platform, would that have been better?

Case 2:

To meet the needs of rapid deployment, we made a simple encapsulation of Jenkins to create an online packaging platform. However, the parameters on the packaging page were hardcoded. Moreover, for each project integration, a separate page had to be implemented. Later, when faced with differentiated requirements from the integration of hundreds of applications, the platform had to be rebuilt from scratch.

If we had used an interface-based approach at the design stage to make the parameters configurable, would that have been better?

In addition to these examples, when selecting technologies, adopting a separation of front-end and back-end development, selecting mainstream technology stacks, implementing typical design patterns, and adopting a relatively unified language type can all contribute to the future scalability of the platform.

Having the ability to iterate functionality quickly and onboard team members rapidly to form an effective team are all issues that need to be considered when designing a platform.

Of course, top-level planning does not mean over-designing. I am only suggesting that you reserve some space within foreseeable limits to avoid potential awkwardness in the future.

Paying Attention to Metadata Governance during Implementation #

Metadata, also known as meta-data, can be understood as a keychain that links the data structure of an entire platform. For example, application names, module names, security IDs, etc.

All platforms need to use this metadata when organizing data structures, and once used, it is difficult to change easily. This is because these metadata elements may already exist as constraints for various primary and foreign keys in the data model.

For a single platform, there are no major issues in maintaining this metadata. However, for the integration between platforms in the future, this metadata becomes a kind of lingua franca. If platforms speak different languages, a lot of translation processing needs to be added, which leads to increased system coupling and fragile connections.

For example, while we may call the same module “购物车” (Chinese for “shopping cart”) in my platform, it might be called “shopping-cart” in your platform, and it may even be further divided based on platforms, such as “shopping-cart-android” and “shopping-cart-ios”, or even based on feature dimensions, such as “shopping-cart-feature1”. Clearly, aligning the data on both sides is not easy.

Of course, metadata governance is not something a single platform can solve, it requires top-level planning as well.

For example, establishing a unified CMDB within the company to centrally manage application information. Or, creating an application creation approval process, which uses a standardized process to control the application’s lifecycle and manage its basic information. These are all technical debts, and the later they are addressed, the higher the cost of repayment.

Phase Three: From Complexity to Simplicity #

Congratulations on reaching the third phase! By now, you have accumulated some experience in DevOps platform construction and successful case studies in various vertical domains. During this phase, we need to address three main challenges:

Too many platforms. It is necessary to switch between platforms for different tasks.
Platforms are too complex. Specialized training is required for personnel to properly utilize the platforms and implement desired functionalities.
Unclear platform value. For example, what value does using the platform bring? How much contribution does it make to the team and business?

For this phase, my suggestion is to use integration tools to simplify and unify the interface, streamline operations, and measure effectively.

Integration tools refer to a collection that includes open-source, semi-developed, and commercial tools.

Rather than providing a single tool, you should offer a complete solution that solves not only one problem but also various issues throughout the delivery process.

Enterprise Platform Governance #

If there was no top-level planning initially, by now, there should be numerous small and large-scale platforms within the organization. The first step you need to take is platform-level governance.

Firstly, identify the existing platforms and their usage, such as which businesses are utilizing them and what functionalities have been implemented.

Bringing together all the platforms is not an easy task, and it may even exceed technological realms. Particularly for large enterprises, platforms form the foundation for many teams. If a platform is no longer required, it means the team’s focus needs to be adjusted.

Hence, my first suggestion is a more moderate approach: identify the main delivery path. Cover this main path with a single platform that connects the capabilities of various specific tools, allowing truly exceptional platforms to stand out. To achieve this, a continuous delivery pipeline is required.

Over the years, I have been involved in the construction of continuous delivery platforms and have summarized a lot of experience. I will discuss in detail later in this content how to design a modern continuous delivery pipeline platform.

There is still a significant gap between a pipeline platform and an integrated platform. After all, the design concepts, operational paths, and interface styles of different platform tools vary greatly.

Therefore, my second suggestion for practical implementation is to differentiate platforms and tools and allow platforms to excel.

For example, although there are many tools in the testing environment, a complete testing platform can actually fulfill all the testing requirements. In other words, test personnel only need to work on this platform. Once the complex internal tools converge into several core platforms within the enterprise, users will have fewer scenarios of switching interfaces and can easily perform daily tasks through the integration of platforms.

Building a Self-Service Tool Platform #

In this phase, self-service becomes the core concept of platform construction.

Self-service means users can log in to the platform, perform operations themselves, access relevant data, and obtain useful information on their own.

To achieve self-service, simplifying operations is crucial. In other words, if a task can be completed with just one click, it truly achieves self-service.

This might sound somewhat exaggerated. However, breaking down functional barriers and enabling cross-functional empowerment relies on the self-service capabilities of the platform. Often, when you complain about “why some people still don’t know how to use the platform even though it’s designed so simple,” it simply means that the platform is still not simple enough.

Previously, the Jenkins community initiated a project called “5 Clicks, 5 Minutes.” The goal was for users to be able to establish a Jenkins service with just five clicks and within five minutes.

The result of this project is the current Jenkins creation navigation, which reduces the cost of service establishment to the minimum and helps more users get started.

As you can see, user experience simplicity is not related to technical complexity but rather whether there is empathetic thinking. Therefore, when constructing a platform, always maintain empathy.

Summary #

The construction of a platform within an enterprise is a long-term issue. If you were to ask me for a summary of the experience of building a DevOps platform for an enterprise, my answer would be the “Four Transformations”: standardization, automation, servicization, and digitalization. In fact, these are also the core principles guiding platform construction.

Standardization: Everything should have rules and standards;
Automation: Eliminate unnecessary manual operations. If it can be done with one click, it should not be done twice;
Servicization: Design for users, not experts, so that everyone can complete their work without external dependencies;
Digitalization: Collect, aggregate, analyze, and present data to objectively demonstrate it, and let data guide continuous improvement.

Thought-Provoking Questions #

Finally, do you have any hidden gems of tools for platform building that you can share? Feel free to write your thoughts and answers in the comments section, and let’s discuss and learn together. If you found this article helpful, please feel free to share it with your friends.