02 Isolated Processes Let's Look at the Nature of Containers

02 Isolated Processes Let’s Look at the Nature of Containers #

Hello, I’m Chrono.

In the last lesson, we had a preliminary understanding of container technology. We installed the most popular container, Docker, in a Linux virtual machine, and we also used commands like docker ps and docker run to operate containers.

Broadly speaking, container technology is a combination of dynamic containers, static images, and remote repositories. However, the term “container,” as the core concept in container technology, is difficult to accurately grasp its connotation and essence for most people who are new to this field, even for some experienced users.

So today, let’s take a closer look at what exactly is a container (specifically, a narrow definition of a dynamic container).

What is a Container? #

Literally speaking, a container is a “Container”. It is commonly metaphorized as a shipping container in the real world, which happens to correspond to the practical meaning of Docker, as the dockworkers (the cute little whale) are constantly moving shipping containers.

The purpose of a shipping container is to standardize the packaging of various goods. Once the packaging is completed, it can be transported from one place to any other place. Compared to bulk forms, shipping containers isolate the two worlds inside and outside the container, maintaining the original form of the goods, avoiding mutual interference between the inside and outside, and greatly simplifying the storage, transportation, and management of goods.

Returning to our computer world, containers also play the same role, except that they package running applications, which are processes. Similarly, it isolates processes from the outside world, preventing processes from affecting external systems.

Let’s try to operate and see what processes look like when running inside a container.

First, we use the docker pull command to pull a new image - the Alpine operating system:

docker pull alpine

Then we use the docker run command to run its shell program:

docker run -it alpine sh

Note that here we add an additional -it parameter, so we will temporarily leave the current Ubuntu operating system and enter the container.

Now, let’s execute the cat /etc/os-release and ps commands, and finally use exit to exit and see the differences between inside and outside the container:

As shown in this screenshot, when viewing system information inside the container, it is no longer the Ubuntu system outside, but it has become Alpine Linux 3.15. The ps command will only show a completely “clean” running environment, with no other processes except Shell (sh).

In other words, inside the container is a completely new Alpine operating system. The application running here cannot see the Ubuntu system outside, and the two systems are “isolated” from each other, like a “paradise”.

We can also pull an image of Ubuntu 18.04, enter the container in the same way, and then execute commands like apt update, apt install to see:

docker pull ubuntu:18.04
docker run -it ubuntu:18.04 sh

# The following commands are executed inside the container
cat /etc/os-release
apt update
apt install -y wget redis
redis-server &

I won’t take a screenshot here, and you can try it out yourself to see the specific results. As you can see, there is another complete Ubuntu 18.04 system inside the container. We can do anything in this “paradise”, such as installing applications, running Redis services, etc. But no matter what we do inside the container, it will not affect the Ubuntu system outside (of course, it is not absolute).

From this point, we can draw a preliminary conclusion: a container is a special isolation environment that allows processes to only see limited information within this environment and cannot affect the external environment.

So naturally, another question arises: why do we need to create such an isolation environment? Isn’t it better to let processes run directly in the system?

Why Isolation is necessary #

I believe that due to the epidemic in the past two years, you are not unfamiliar with the word “isolation”. In order to prevent the spread of the epidemic, we need to establish isolation wards and designated hospitals to control the infected population within specific areas. Furthermore, actions such as sealing off communities and closing down shopping malls have been implemented. Although these measures have brought some inconvenience, they are all for the normal operation of the entire society on a larger scale.

Similarly, isolation in the computer world is also based on the same consideration, which is system security.

For the Linux operating system, an application with no restrictions is very dangerous. This process can see all the files, all the processes, all the network traffic in the system, and can access any data in memory. Therefore, malicious programs can easily disable the system, and normal programs may also cause information leakage or other security incidents due to unintentional bugs. Although Linux provides user permission control to restrict processes from accessing certain resources, this mechanism is still relatively weak and far from meeting the requirements of real “isolation”.

Now, by using container technology, we can run applications in a highly protected “sandbox” environment, just like inviting processes into an “isolated hotel”. They can freely operate within this environment, but are not allowed to “cross the border”, ensuring the security of the system outside the container.

In addition, there are various resources in computers, such as CPUs, memory, hard drives, and network cards. Although high-performance servers nowadays have dozens of CPUs, hundreds of GBs of memory, TBs of hard drives, and 10-gigabit network cards, these resources are ultimately limited, and considering cost, it is not allowed for an application to occupy resources without restrictions.

Another capability of container technology is to add resource isolation to applications, separating a portion of the system’s resources and only allowing them to use a specified quota, such as only using one CPU or 1GB of memory. It is like ensuring three meals a day in an isolated hotel, but it is not allowed to have extravagant dishes. This can avoid excessive system consumption by processes inside the container, fully utilize computer hardware, and allow limited resources to provide stable and reliable services.

Therefore, although processes are “contained” within containers and lose some freedom, they guarantee the security of the entire system. As long as the processes comply with the isolation rules and do not do anything exceptional, they can operate normally.

What is the difference between containers and virtual machines #

You may say that, in this sense, containers are just a type of “sandbox” technology, similar to virtual machines, so what are the differences between them? What are their advantages?

In my opinion, containers and virtual machines both aim to isolate resources to ensure system security, and then maximize resource utilization.

As you may have seen when using VirtualBox/VMware to create virtual machines, they can fully virtualize a set of computer hardware within the host system, and any operating system can be installed within it. The two systems, inside and outside, are completely isolated and do not interfere with each other.

Similarly, in the data center, virtual machine software (referred to as Hypervisor in the diagram) can also virtualize a physical server into multiple logical servers. These logical servers are independent of each other and can divide the resources of the physical server as needed for different users.

From an implementation perspective, virtual machines virtualize hardware and need to install an operating system on top of it to run applications. Hardware virtualization and operating systems are relatively “heavy” and consume a lot of CPU, memory, and disk resources. However, these consumption does not bring much value and are considered “redundant work” and “useless effort”. The advantage, though, is that the level of isolation is very high, and each virtual machine can achieve complete independence.

Now let’s look at containers (represented by Docker in the diagram). They directly utilize the underlying computer hardware and operating system because they have one less layer compared to virtual machines. Therefore, they naturally save CPU and memory, making them lightweight and more efficient in utilizing hardware resources. However, because multiple containers share the operating system kernel, the isolation level of the application is not as high as that of virtual machines.

In terms of efficiency, containers have a significant advantage over virtual machines. As shown in the diagram, with the same amount of system resources, a virtual machine can only run 3 applications, while the rest of the resources are used to support the virtual machine. On the other hand, containers can free up these resources and run 6 applications simultaneously.

Of course, this comparison chart is just a figurative representation and not a rigorous numerical analysis. However, we can make a simple comparison between VirtualBox/VMware virtual machines and Docker containers.

After installing a typical Ubuntu virtual machine, the size is in the GB range, and installing additional applications can easily exceed 10GB. The startup time usually takes several minutes, and running ten virtual machines on our computer may be the limit. On the other hand, the size of an Ubuntu container image is only a few tens of MB, and it starts up very quickly, typically within one second. Running hundreds of containers simultaneously is not an issue.

However, virtual machines and containers are not mutually exclusive technologies; they can be used together. Just like in our course, virtual machines can be used to achieve strong isolation from the host machine, and then Docker containers can be used within the virtual machines to quickly run applications.

How is isolation achieved #

We know that virtual machines use hypervisors (like KVM, Xen), but how exactly do containers interact with the underlying computer hardware and operating system? And why do they have efficient and lightweight isolation?

The secret lies in the Linux operating system kernel, which provides three technologies for resource isolation: namespace, cgroup, and chroot. Although the original purpose of these three technologies was not to implement containers, when combined, they have a wonderful “chemical reaction”.

Namespace was introduced in Linux 2.4.19 in 2002. It is similar to namespaces in programming languages and can create independent file systems, hostnames, process IDs, network spaces, etc. It’s like building a small house for a process, achieving isolation between system resources and process resources.

Cgroup was introduced in Linux 2.6.24 in 2008. Its full name is Linux Control Group, and it is used to prioritize and set quotas for process resources such as CPU and memory. It’s like adding a ceiling to the small house of a process.

Chroot, on the other hand, is much older than namespace and cgroup. It appeared as early as UNIX V7 in 1979. It can change the root directory of a process, i.e. limit access to the file system. It’s like putting a floor in the small house of a process.

By combining these three technologies, a well-isolated container with a square shape is created, where processes can move in and live a “happy life”. To describe this situation, I think a line of poetry by Lu Xun is most appropriate: Hide in a small building and become unified, regardless of winter, summer, spring, or autumn.

Summary #

Alright, today we learned about the key concept in container technology: dynamic containers. Let’s summarize the main points of this course:

A container is a special “sandbox” environment in an operating system where processes can only access restricted information, achieving isolation from the external system.
The purpose of container isolation is to ensure system security by limiting access to various resources by processes.
Compared to virtual machine technology, containers are lightweight and more efficient, consuming very few system resources, making them highly advantageous in the era of cloud computing.
The fundamental technologies for implementing containers are Linux namespaces, cgroups, and chroot.

Homework #

Finally, it’s time for homework. Here are two questions for you to think about:

Can you compare containers with real-world shipping containers and list more advantages of container technology?
There is a saying: containers are lightweight virtual machines. Do you think this statement is correct?

Feel free to leave your comments and join the discussion in the comments section. If you find this helpful, please share it with your friends so we can learn together. See you in the next class.