16 Environment Management, Is All Code a Kind of Experience

16 Environment Management, Is All Code a Kind of Experience #

Hello, I am Shi Xuefeng.

There are often jokes circulating online about prejudice maps, in simple terms, “prejudice maps” refer to people’s impressions of other places around the world. For example, many people believe that people from Tianjin can all perform cross talk (a traditional Chinese comedic performance).

If there were prejudice maps in software development, then for those who are not familiar with operations, when mentioning the operations team, they might think of them as the people who maintain the environment. As a result, the environment becomes the “number one scapegoat” in the software industry. For example, if there is a problem with the live system, it could be due to incorrect environment configurations; if certain functionalities were not tested in testing, it could be because there is no testing environment; if a bug is found in the development phase, without considering any other factors, it is immediately blamed on the environment… Therefore, as you can see, it seems like any problem could be related to the environment. This baseless prejudice also intensifies the lack of trust between development and operations.

Challenges in environment management #

So why is the environment always so worrying? In fact, this is because the complexity of modern businesses can be intuitively reflected in various aspects of environment management. In summary, I think there are 5 main challenges:

Diverse types of environments

First of all, there are more and more types of environments related to software, such as development environment, testing environment, UAT (User Acceptance Testing) environment, staging environment, gray environment, production environment, etc. It’s not an easy task just to distinguish the names and purposes of these environments.

Increasing complexity of environments

The architecture of modern applications is gradually shifting from monolithic applications to microservices applications. With the decomposition of services, various services such as caching, routing, messaging, and notifications are indispensable. Any misconfiguration in any of these areas can cause the application to fail to function properly. This doesn’t even include dependencies and invocation relations between various services, which causes the cost of deploying a complete environment to be extremely high for many businesses, and it may even become an impossible task.

Difficulty in ensuring environment consistency

For example, the classic blame shifting phrase “It works on my machine” highlights the issue of inconsistent environments. If the consistency of various environment configurations cannot be guaranteed, similar issues will continue to occur indefinitely. In fact, in many businesses, the production environment is managed and maintained by dedicated teams, and the management and configuration are relatively controlled. However, for the development environment, it is generally considered a black box, as it is the local computer of the developers and can hardly be managed even if one wants to.

Slow delivery of environments

Due to the separation of responsibilities, the process of requesting an environment is usually lengthy. It often takes 2 weeks or even longer from the application request to the delivery of a usable environment.

On the one hand, this is related to internal process approvals within the company. I have seen companies that require 5 levels of approvals to request an environment. Imagine that in a flat organization, there may not even be 5 levels between an employee and the CEO. On the other hand, environment configuration relies on manual completion, which is cumbersome and inefficient. In most cases, environment configuration documents are outdated and do not dynamically adjust according to application upgrades. Days can easily pass by.

Difficulty in tracing environment changes

After the product goes live and problems occur, it takes a long time to find out that it was caused by a configuration of some environment parameters. As for who made this configuration change, when it was changed, why it was modified, and what reviews it went through, we have no idea. This brings significant challenges and potential risks to the stability of the online environment. It is worth noting that the importance of environment configuration changes is no less than that of code changes, and they usually require strict control.

Infrastructure as Code #

You might ask, is there a method to solve these problems? Yes, there is! It’s called Infrastructure as Code. You can say that without practicing Infrastructure as Code, DevOps won’t go far. So, what exactly is Infrastructure as Code?

Infrastructure as Code is a way to describe and manage environment configurations using a descriptive language and automate the configuration process. The most popular tools representing Infrastructure as Code are Chef, Ansible, Puppet, and Saltstacks - commonly known as CAPS.

This concept may sound abstract, so what does the code that describes the infrastructure look like? Let me share an example of an Ansible configuration with you.

---
  - name: Playbook
    hosts: webservers
    become: yes
    become_user: root
    tasks:
      - name: ensure apache is at the latest version
        yum:
          name: httpd
          state: latest
      - name: ensure apache is running
        service:
          name: httpd
          state: started

Even if you’re not familiar with Ansible, with the help of comments, you can probably understand the environment configuration process from this code. Essentially, this code accomplishes two things: it installs the httpd package and starts the related service.

Why can Infrastructure as Code solve the aforementioned problems?

Firstly, for the same application, the configuration process for various environments is similar, with only minor differences in configuration parameters and dependent services. By codeifying the configuration process for all environments, each environment corresponds to a configuration file, enabling the reuse of common configurations. When environments change, you no longer need to log in to the machines; instead, you can directly modify the configuration files. This makes environment configuration a living document that will no longer become outdated due to lack of updates.

Secondly, the configuration process for environments can be completely automated using tools. You only need to reference the configuration file for the corresponding environment, and the tool takes care of the rest. Additionally, even if the initial configurations of different machines differ, the tool can ensure the eventual consistency of the environments. Since modern tools commonly support idempotence, meaning they automatically detect which steps have already been configured, they skip those steps and continue with subsequent operations. This significantly increases the efficiency of configuring large-scale environments.

Finally, since environment configuration becomes code, it can naturally be managed using version control systems, leveraging the benefits of version control. Changes to any environment configuration can be implemented using commands similar to Git, consolidating the entry point for environment configuration and making all environment changes fully traceable.

The practice of Infrastructure as Code simplifies complex technology by using code that everyone can understand. This means that even team members who are not familiar with operations or tools can comprehend and modify the process. This not only gives team members a common language but also greatly reduces dependencies and lowers communication and collaboration costs. This is the hidden value of Infrastructure as Code and is especially in line with the collaboration principles advocated by DevOps.

At this point, you might say that this is just an automated approach, and it doesn’t seem particularly special. Think about it, the original intention of DevOps is to break the barrier between development and operations. But how exactly can this be achieved?

In most companies, the work of deployment and release is handled by dedicated operations teams, and the development teams only need to provide the tested software packages to the operations teams. Therefore, the natural boundary between development and operations lies in the stage of delivering software packages, and only by connecting the development stage with the Continuous Integration (CI) pipeline for software integration testing and the operations stage with the Continuous Deployment (CD) pipeline for application deployment, can the integration of development and operations truly be achieved. When a version control system meets Infrastructure as Code, it forms a fantastic combination - GitOps.

GitOps Practice for Development and Operations Integration #

As the name suggests, GitOps is a solution based on the version control system Git. The core idea is to use Git as a unified data source, leveraging pull requests similar to code submissions to complete the delivery process from development to operations. This allows collaboration between development and operations to be based on Git.

Although GitOps was initially developed based on container technology and the Kubernetes platform, its concept is not limited to the use of container technology. In fact, its core lies in the code-based description of the application’s deployment environment and process.

In GitOps, each environment is associated with an environment configuration repository, which contains everything needed for application deployment. For example, when using Kubernetes, this would be a set of resource definition files that describe which version to deploy, which ports to open, and the deployment process.

Of course, you can also use tools like Helm to manage these resource files. If you are not yet familiar with Kubernetes, you can simply understand it as the Linux of the cloud era, and Helm as packages management tools like RPM or APT, which simplifies the application deployment process through application packaging.

In addition to applications based on Kubernetes, you can also use methods similar to Ansible Playbook. However, compared to the readily available Helm tool, when using Ansible, you need to implement some deployment scripts on your own, but this is not a complex task.

Take a look at the following example configuration file. These configuration files are written in YAML and describe the main information for application deployment. The image name is represented as a parameter, and there is a separate file to manage these variables. You can replace them with the actual versions of your application to achieve the goal of deploying different applications.

apiVersion: extensions/v1beta1
kind: Deployment
spec:
replicas: 1
template:
  metadata:
    labels:
      app: demo 
  spec:
    containers:
    - name: demo
      image: "{{ .Values.image.tag }}"
      imagePullPolicy: IfNotPresent
      ports:
      - containerPort: 80

Now, let’s take a look at how this solution is implemented.

First, developers submit new code changes to the Git repository, which automatically triggers the continuous integration pipeline. For common version control systems, this can be achieved by configuring hooks. After the code goes through a series of build, test, and inspection steps, and ultimately passes the continuous integration pipeline, a new version of the application is generated and uploaded to the artifact repository, typically in the form of a Docker image file or a war package.

Taking the above configuration as an example, suppose the 1.0 version image of the application is generated. Next, a code merge request will be automatically created for the configuration repository of the test environment, with the only change being the modification of the version number of the image to 1.0. At this point, developers or testers can accept the merge to incorporate this environment change configuration into the main branch, and once again trigger the deployment pipeline in an automated manner to deploy the new version of the application to the test environment. Each application deployment follows the same process, which typically involves copying the latest version of the application artifact to the server and restarting, or updating the container image and triggering rolling upgrade.

At this point, the test environment is deployed. Of course, if Kubernetes is used, the feature of namespaces can be leveraged to quickly create a set of independent environments, which is an advantage that traditional deployments do not have. After the test environment is accepted, the code can be merged into the main branch, and the complete integration pipeline can be triggered once again for more comprehensive testing.

After the pipeline is successfully executed, a merge request will be automatically created for the configuration repository of the pre-production environment. Once the review is approved, the pre-production environment will be deployed automatically. If responsibility separation requires that the deployment of the pre-production environment must be performed by operations personnel, the permission to merge code can be granted only to operations personnel. When operations personnel receive a notification, they can log in to the version control system, view the scope of the changes in this release, assess the impact, and complete the deployment according to the deployment rhythm. And this operation can be achieved with just a click of a button in the interface. In this way, the collaboration between the development and operations teams is no longer a black box. Everyone completes the delivery and deployment of the application based on code submission and review, and the configuration processes and parameter information throughout the process are transparently shared.

I’m sharing a process diagram with you to help you fully understand the process of layered deployment.

So what are the benefits of GitOps? First of all, there is shared and unified management of environment configuration. The originally complicated process of environment configuration is now managed in a code-based manner that everyone can understand. This greatly simplifies the complexity of deployment for DevOps.

In addition, all the latest environment configurations are based on the Git repository, and each change and deployment process is also recorded by the version control system. Even if it is just an upgrade of the environment tool, it needs to go through the complete process mentioned above, thereby achieving a layered verification of environment and tool upgrades. Therefore, this is similar to the concept of infrastructure as code.

Governance Practices for Development Environments #

Regarding the governance of development environments, let me give you another example. For the development of smart hardware products, the biggest pain point is the complexity of configuring various environments and tools. Each new employee needs to spend several days configuring the environment. In addition, due to frequent tool upgrades and the need for parallel development on multiple platforms, developers often need to switch between multiple tools, which incurs a high management cost.

To solve this problem, the infrastructure as code approach can be adopted to generate a Docker image that contains all the tool dependencies and distribute it to the development team. During development, only one container needs to be deployed, and the code directory is mounted into it to create a fully standardized development environment. When a tool version is upgraded, a new image can be created, and once developers pull it locally, all the tools are upgraded, greatly simplifying the maintenance cost of the development environment.

In fact, we can also leverage our innovative abilities to combine multiple tools for solving practical problems. For example, in our team, we had to support both virtualized and containerized environments simultaneously. For virtualization, we used the traditional Ansible method for environment deployment, while containerization relied on Dockerfile for image creation. This posed a problem: we had to maintain two sets of configurations and modify the configuration files for both virtualization and containerization each time an upgrade was performed. To simplify this process, the advantages of both approaches can be combined, and a single data source can be used to maintain a standardized environment.

Specifically, in the Dockerfile, apart from the base environment and startup scripts, the environment configuration part can also be completed using Ansible, as shown in the following example:

FROM  harbor.devops.com:5000/test:ansible 
MAINTAINER XX <[[email protected]](/cdn-cgi/l/email-protection)>
ADD ./docker  /docker
WORKDIR /docker
RUN export TMPDIR=/var/tmp && ansible-playbook -v -i playbooks/inventories/docker playbooks/docker_container.yml

Practices for Local Development Testing #

In fact, I have always believed that environment management is a potential “big pit” in the implementation of DevOps. In order to improve developers’ efficiency, the industry is also exploring many new practices. As I mentioned before, the concept of fail fast is about providing feedback on failures as early as possible in order to minimize the cost of problem fixing. However, for developers, due to the lack of a testing environment, they often have to wait until the code is submitted and deployed before obtaining feedback. This cycle can obviously be optimized. Regarding how to solve the problem of local development testing, there are some related practices in the Jenkins community.

For example, if you create a minimal testing environment based on Kubernetes, normally, if you make a code change, you need to go through steps such as code submission, image building, artifact upload, and server image update before you can start debugging. However, if you use the KSync tool, all these steps can be omitted. KSync can establish a connection between your local workspace and the remote container directory, and automatically synchronize the code. In other words, if you modify a line of code in the local IDE and save it, KSync can help you transfer the local code to the container online. This is particularly convenient for interpreted languages like Python.

Google has also open-sourced a container-based development and deployment tool called Skaffold. Similar to KSync, you can use Skaffold commands to create a Kubernetes environment. After making a code change locally, Skaffold will automatically regenerate the image file, push it to the remote, and deploy it, making your code development truly WYSIWYG. Developers only need to focus on writing code, while the rest is automated. This is also a development direction for future DevOps engineering practices.

Summary #

Today, I introduced to you five challenges in enterprise environment management: diversity, complexity, consistency, delivery speed, and change traceability. I explained why Infrastructure as Code is the best practice for solving environment management problems. I also shared three examples of Infrastructure as Code. I hope this can help you understand the process.

If you are not familiar with Kubernetes and containers, some of the content may be difficult to digest. What I want to say is that regardless of the technology used, code-based management is the future trend. I recommend that you carefully review the code and flowcharts in the article and try to use the CAPS tool to redefine the environment deployment process and implement the environment configuration process using code. If you have any questions, feel free to ask in the comments section.

Thought-provoking Questions #

What do you think is the biggest challenge in implementing DevOps? Do you have any suggestions for overcoming these challenges?

Feel free to leave your thoughts and answers in the comments section. Let’s discuss and learn together. If you find this article helpful, please feel free to share it with your friends.