46 Case Study Why Has Container Start Up Slowed Down Significantly

46 Case Study Why Has Container Start-up Slowed Down Significantly #

Hello, I am Ni Pengfei.

Unconsciously, we have completed the four basic modules of the entire column, namely CPU, memory, file system and disk I/O, and network performance analysis and optimization. I believe you have mastered the basic analysis and troubleshooting methods of these basic modules, and you are familiar with the relevant optimization methods.

Next, we will enter the last important module - comprehensive practical exercises. This part of the practical content will also review and deepen the knowledge we have learned before.

We all know that with the popularity of technologies such as Kubernetes and Docker, more and more enterprises have embarked on the road of application containerization. I believe that while you are learning about these technologies, you must have heard a lot about the various advantages brought by Docker-based microservices architecture, such as:

  • Using Docker, packaging the application and related dependencies into images makes deployment and upgrades faster;

  • After splitting the traditional monolithic application into smaller microservices applications, the functionality of each microservice becomes simpler and can be managed and maintained separately;

  • Each microservice can scale horizontally according to demand. Even if a failure occurs, only the local service is unavailable, rather than causing the entire service to be unavailable.

However, no technology is a silver bullet. These new technologies bring not only many convenient functions but also higher complexity, such as reduced performance, complex architecture, and troubleshooting difficulties.

Today, I will use a Tomcat case study to teach you how to analyze performance issues after application containerization.

Case Preparation #

For today’s case study, we only need one virtual machine. It is still based on Ubuntu 18.04 and is also applicable to other Linux systems. The case environment I am using is as follows:

  • Machine configuration: 2 CPUs, 8GB memory.

  • Pre-install tools such as docker, curl, jq, pidstat, etc., using the command apt install docker.io curl jq sysstat.

Among them, the jq tool is specifically used to process JSON on the command line. In order to better display JSON data, we use this tool to format the JSON output.

You need to open two terminals, log in to the same virtual machine, and install the above tools.

Note that all the following commands are assumed to be run as the root user. If you are logged in to the system as a regular user, please run the command sudo su root to switch to the root user.

If you encounter any problems during the installation process, you can search for solutions online. If you still can’t solve it, remember to ask me in the comments section.

With this, the preparation work is complete. Next, we will officially enter the operation phase.

Case Study #

The case study we are going to analyze today is a Tomcat application. Tomcat is a lightweight application server developed by the Jakarta Project under the Apache Foundation, and it is based on the Java language. The Docker community also maintains the official Tomcat image, which you can directly use to start a Tomcat application.

Our case study is also based on the official Tomcat image. The core logic of the application is very simple - it allocates some memory and outputs “Hello, world!”.

<%
byte data[] = new byte[256*1024*1024];
out.println("Hello, world!");
%>

To facilitate your execution, I have packaged it into a Docker image feisky/tomcat:8 and pushed it to Docker Hub. You can run it directly by following the steps below.

In terminal 1, execute the following command to start the Tomcat application and listen on port 8080. If everything goes well, you should see the following output:

# Set the memory to 512MB using the -m flag
$ docker run --name tomcat --cpus 0.1 -m 512M -p 8080:8080 -itd feisky/tomcat:8
Unable to find image 'feisky/tomcat:8' locally
8: Pulling from feisky/tomcat
741437d97401: Pull complete
...
22cd96a25579: Pull complete
Digest: sha256:71871cff17b9043842c2ec99f370cc9f1de7bc121cd2c02d8e2092c6e268f7e2
Status: Downloaded newer image for feisky/tomcat:8
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
2df259b752db334d96da26f19166d662a82283057411f6332f3cbdbcab452249

From the output, you can see that the docker run command automatically pulls the image and starts the container.

By the way, many students have previously asked how to download Docker images. In fact, the above docker run command automatically downloads the image to the local machine before starting it.

Since Docker images are managed in multiple layers, you will see the download progress of each layer. In addition to automatically downloading the image like with docker run, you can also do it in two steps: first download the image, and then run the container.

For example, you can first run the docker pull command below to download the image:

$ docker pull feisky/tomcat:8
8: Pulling from feisky/tomcat
Digest: sha256:71871cff17b9043842c2ec99f370cc9f1de7bc121cd2c02d8e2092c6e268f7e2
Status: Image is up to date for feisky/tomcat:8

Obviously, the image already exists on my machine, so there is no need to download it again, and it just returns a successful message.

Then, in terminal 2, use curl to access port 8080 of the Tomcat server to verify that the case has been started successfully:

$ curl localhost:8080
curl: (56) Recv failure: Connection reset by peer

Unfortunately, curl returns an error “Connection reset by peer”, indicating that the Tomcat service is not responding to client requests correctly.

Is there a problem with the Tomcat startup? Let’s switch back to terminal 1 and execute the docker logs command to view the container logs. Note that you need to add the -f parameter to track the latest log output of the container:

$ docker logs -f tomcat
Using CATALINA_BASE:   /usr/local/tomcat
Using CATALINA_HOME:   /usr/local/tomcat
Using CATALINA_TMPDIR: /usr/local/tomcat/temp
Using JRE_HOME:        /docker-java-home/jre
Using CLASSPATH:       /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar

From here, you can see that the Tomcat container only prints environment variables without application initialization logs. In other words, Tomcat is still in the process of starting up, and it won’t respond if you try to access it at this time.

To observe the startup process of Tomcat, we will continue to keep the docker logs -f command in terminal 1, and execute the following command multiple times in terminal 2 to attempt accessing Tomcat:

$ for ((i=0;i<30;i++)); do curl localhost:8080; sleep 1; done
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
# It will be blocked for a while here
Hello, world!
curl: (52) Empty reply from server
curl: (7) Failed to connect to localhost port 8080: Connection refused
curl: (7) Failed to connect to localhost port 8080: Connection refused

After observing for a while, you will see that curl finally gives us the desired result “Hello, world!” after a period of time. However, there is also an “Empty reply from server” error, and a persistent “Connection refused” error. In other words, after responding to one request, Tomcat does not respond to any further requests.

What’s going on here? Let’s go back to terminal 1 and observe the Tomcat log to see if we can find any clues.

From terminal 1, you should be able to see the following output:

18-Feb-2019 12:43:32.719 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/docs]
18-Feb-2019 12:43:33.725 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/docs] has finished in [1,006] ms
18-Feb-2019 12:43:33.726 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/manager]
18-Feb-2019 12:43:34.521 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/manager] has finished in [795] ms
18-Feb-2019 12:43:34.722 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8080"]
18-Feb-2019 12:43:35.319 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["ajp-nio-8009"]
18-Feb-2019 12:43:35.821 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 24096 ms
root@ubuntu:~#

From the content, it can be seen that Tomcat finishes initializing and starts successfully after 24 seconds. From the logs, there doesn’t seem to be any issues.

However, if you pay attention to the last line, it is obvious that it returns to the Linux SHELL terminal instead of continuing to wait for the container logs output by Docker.

When the output returns to the SHELL terminal, it usually means that the previous command has ended. The previous command, in this case, is the docker logs -f command. Therefore, there are only two possibilities for its exit: either the container has exited, or the dockerd process has exited.

Which scenario is it? We need to further confirm this. We can execute the following command in Terminal 1 to check the status of the container:

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                            PORTS               NAMES
0f2b3fcdd257        feisky/tomcat:8     "catalina.sh run"   2 minutes ago       Exited (137) About a minute ago                       tomcat

You can see that the container is in an Exited state, indicating that the container has exited. But why did this happen? Obviously, there are no clues in the container logs we’ve seen earlier, so we need to investigate Docker itself.

We can call the Docker API to query the container’s status, exit code, and error message to determine the reason for container exit. This can be done using the docker inspect command. For example, you can continue executing the following command to output only the container’s status using the -f option:

# Display container status (jq is used to format the json output)
$ docker inspect tomcat -f '{{json .State}}' | jq
{
  "Status": "exited",
  "Running": false,
  "Paused": false,
  "Restarting": false,
  "OOMKilled": true,
  "Dead": false,
  "Pid": 0,
  "ExitCode": 137,
  "Error": "",
  ...
}

This time, you can see that the container is already in an exited state, OOMKilled is true, and ExitCode is 137. Among these, OOMKilled indicates that the container was killed due to out-of-memory (OOM).

As mentioned earlier, when memory is insufficient, some applications may be killed by the system. However, why would there be insufficient memory? Our application is allocated 256 MB of memory, and when the container starts, it sets 512 MB of memory through the -m option, which should be sufficient. Here, I suppose you should still remember that when OOM occurs, the system will record relevant OOM information in the log. So, next, we can execute the dmesg command in the terminal to view the system log and locate the OOM-related logs:

$ dmesg
[193038.106393] java invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
[193038.106396] java cpuset=0f2b3fcdd2578165ea77266cdc7b1ad43e75877b0ac1889ecda30a78cb78bd53 mems_allowed=0
[193038.106402] CPU: 0 PID: 27424 Comm: java Tainted: G  OE    4.15.0-1037 #39-Ubuntu
[193038.106404] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
[193038.106405] Call Trace:
[193038.106414]  dump_stack+0x63/0x89
[193038.106419]  dump_header+0x71/0x285
[193038.106422]  oom_kill_process+0x220/0x440
[193038.106424]  out_of_memory+0x2d1/0x4f0
[193038.106429]  mem_cgroup_out_of_memory+0x4b/0x80
[193038.106432]  mem_cgroup_oom_synchronize+0x2e8/0x320
[193038.106435]  ? mem_cgroup_css_online+0x40/0x40
[193038.106437]  pagefault_out_of_memory+0x36/0x7b
[193038.106443]  mm_fault_error+0x90/0x180
[193038.106445]  __do_page_fault+0x4a5/0x4d0
[193038.106448]  do_page_fault+0x2e/0xe0
[193038.106454]  ? page_fault+0x2f/0x50
[193038.106456]  page_fault+0x45/0x50
[193038.106459] RIP: 0033:0x7fa053e5a20d
[193038.106460] RSP: 002b:00007fa0060159e8 EFLAGS: 00010206
[193038.106462] RAX: 0000000000000000 RBX: 00007fa04c4b3000 RCX: 0000000009187440
[193038.106463] RDX: 00000000943aa440 RSI: 0000000000000000 RDI: 000000009b223000
[193038.106464] RBP: 00007fa006015a60 R08: 0000000002000002 R09: 00007fa053d0a8a1
[193038.106465] R10: 00007fa04c018b80 R11: 0000000000000206 R12: 0000000100000768
[193038.106466] R13: 00007fa04c4b3000 R14: 0000000100000768 R15: 0000000010000000
[193038.106468] Task in /docker/0f2b3fcdd2578165ea77266cdc7b1ad43e75877b0ac1889ecda30a78cb78bd53 killed as a result of limit of /docker/0f2b3fcdd2578165ea77266cdc7b1ad43e75877b0ac1889ecda30a78cb78bd53
[193038.106478] memory: usage 524288kB, limit 524288kB, failcnt 77
[193038.106480] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[193038.106481] kmem: usage 3708kB, limit 9007199254740988kB, failcnt 0
[193038.106481] Memory cgroup stats for /docker/0f2b3fcdd2578165ea77266cdc7b1ad43e75877b0ac1889ecda30a78cb78bd53: cache:0KB rss:520580KB rss_huge:450560KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:520580KB inactive_file:0KB active_file:0KB unevictable:0KB
[193038.106494] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[193038.106571] [27281]     0 27281  1153302   134371  1466368        0             0 java
[193038.106574] Memory cgroup out of memory: Kill process 27281 (java) score 1027 or sacrifice child
[193038.148334] Killed process 27281 (java) total-vm:4613208kB, anon-rss:517316kB, file-rss:20168kB, shmem-rss:0kB
[193039.607503] oom_reaper: reaped process 27281 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

From the output of dmesg, you can see a detailed OOM record. You can see several key points below.

  • First, the killed process is a java process. From the mem_cgroup_out_of_memory in the kernel call stack, it can be seen that it was killed by OOM because it exceeded the memory limit of the cgroup.

  • Second, the java process is running inside a container, and the memory usage and limit of the container are 512M (524288kB). The current usage has reached the limit, resulting in OOM.

  • Third, the killed process with PID 27281 had a virtual memory of 4.3G (total-vm: 4613208kB), anonymous memory of 505M (anon-rss: 517316kB), and page memory of 19M (20168kB). In other words, anonymous memory is the main memory occupation. Furthermore, the total of anonymous memory and page memory is 524M, which has already exceeded the 512M limit.

From these points, it can be seen that the main memory usage of the Tomcat container is in the anonymous memory, which is actually the heap memory actively allocated.

However, why does Tomcat allocate so much heap memory? It’s important to know that Tomcat is based on Java development, so it’s not hard to imagine that this is likely a problem with JVM heap memory configuration.

We know that JVM automatically manages heap memory based on the total system memory. If not specifically configured, the default limit for heap memory is one-fourth of the physical memory. However, we have already limited the container memory to 512M, so what is the actual heap memory size for Java?

Continuing in the terminal, execute the following command to restart the Tomcat container and use the java command line to check the heap memory size:

# Restart the container
$ docker rm -f tomcat
$ docker run --name tomcat --cpus 0.1 -m 512M -p 8080:8080 -itd feisky/tomcat:8

# Check heap memory (Note that the unit is in bytes)
$ docker exec tomcat java -XX:+PrintFlagsFinal -version | grep HeapSize
    uintx ErgoHeapSizeLimit = 0 {product}
    uintx HeapSizePerGCThread = 87241520 {product}
    uintx InitialHeapSize := 132120576 {product}
    uintx LargePageHeapSizeThreshold = 134217728 {product}
    uintx MaxHeapSize := 2092957696 {product}

You can see that the initial heap memory size (InitialHeapSize) is 126MB, while the maximum heap memory is 1.95GB, which is much larger than the 512MB limit set for the container.

The reason for this is that the container cannot see the memory limit set by Docker for it. Although we set a memory limit of 512M for the container when starting it with the -m 512M option, the actual limit seen from within the container is not 512M.

Continuing in the terminal, execute the following command:

$ docker exec tomcat free -m
          total        used        free      shared  buff/cache   available
Mem:      7977         521        1941           0        5514        7148
Swap:        0           0           0

As expected, the memory seen from within the container is still the host memory.

Now that we know the root cause of the problem, the solution is simple: correctly configure the memory limit for JVM to be 512M.

For example, you can execute the following command to set the initial and maximum memory for JVM to 512MB using the environment variable JAVA_OPTS=’-Xmx512m -Xms512m':

# Delete the problematic container
$ docker rm -f tomcat
# Run a new container
$ docker run --name tomcat --cpus 0.1 -m 512M -e JAVA_OPTS='-Xmx512m -Xms512m' -p 8080:8080 -itd feisky/tomcat:8

Then, switch back to the second terminal and execute the curl command in a loop to check the response from Tomcat:

$ for ((i=0;i<30;i++)); do curl localhost:8080; sleep 1; done
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
Hello, world!

Hello, world!

Hello, world!

You can see that initially, it still shows the “Connection reset by peer” error. However, after waiting for a while, it continuously outputs “Hello, world!”. This indicates that Tomcat has been successfully started.

At this point, switch back to the first terminal and execute the docker logs command to view the logs of the Tomcat container:

$ docker logs -f tomcat
...
18-Feb-2019 12:52:00.823 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/manager]
18-Feb-2019 12:52:01.422 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/manager] has finished in [598] ms
18-Feb-2019 12:52:01.920 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8080"]
18-Feb-2019 12:52:02.323 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["ajp-nio-8009"]
18-Feb-2019 12:52:02.523 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 22798 ms

This time, Tomcat has also started successfully. However, the last line showing the startup time seems quite long. It took 22 seconds for the startup process, which is quite slow.

Since this time is spent on container startup, to troubleshoot this issue, we need to restart the container and use a performance analysis tool to analyze the container process. Considering our previous case, I think we can start by using top.

Switch to the second terminal and run the top command. Then switch back to the first terminal and execute the following command to restart the container:

# Delete the old container
$ docker rm -f tomcat
# Run a new container
$ docker run --name tomcat --cpus 0.1 -m 512M -e JAVA_OPTS='-Xmx512m -Xms512m' -p 8080:8080 -itd feisky/tomcat:8

Next, switch to terminal 2 and observe the output of top:

$ top
top - 12:57:18 up 2 days,  5:50,  2 users,  load average: 0.00, 0.02, 0.00
Tasks: 131 total,   1 running,  74 sleeping,   0 stopped,   0 zombie
%Cpu0  :  3.0 us,  0.3 sy,  0.0 ni, 96.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  5.7 us,  0.3 sy,  0.0 ni, 94.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  8169304 total,  2465984 free,   500812 used,  5202508 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  7353652 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
29457 root      20   0 2791736  73704  19164 S  10.0  0.9   0:01.61 java                         27349 root      20   0 1121372  96760  39340 S   0.3  1.2   4:20.82 dockerd
27376 root      20   0 1031760  43768  21680 S   0.3  0.5   2:44.47 docker-containe              29430 root      20   0    7376   3604   3128 S   0.3  0.0   0:00.01 docker-containe
    1 root      20   0   78132   9332   6744 S   0.0  0.1   0:16.12 systemd

From the output of top, we can see that:

  • Overall, the CPU usage of the two CPUs is 3% and 5.7%, respectively, which is not high. Most of the CPUs are still idle. There is also 7GB of available memory (7353652 avail Mem), which is very sufficient.

  • Specifically for processes, the CPU usage of the java process is 10%, and the memory usage is 0.9%. The usage of the other processes is relatively low.

These metrics are not high, so there doesn’t seem to be any problem. However, what is the actual situation? We need to continue investigating. Since the CPU usage of the java process is the highest, we need to focus on analyzing its performance.

Speaking of process performance analysis tools, you must have thought of pidstat. Next, let’s use pidstat to analyze it again. Go back to terminal 1 and execute the pidstat command:

# -t option is used to display threads, -p option is used to specify the process ID
$ pidstat -t -p 29457 1
12:59:59      UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
13:00:00        0     29457         -    0.00    0.00    0.00    0.00    0.00     0  java
13:00:00        0         -     29457    0.00    0.00    0.00    0.00    0.00     0  |__java
13:00:00        0         -     29458    0.00    0.00    0.00    0.00    0.00     1  |__java
...
13:00:00        0         -     29491    0.00    0.00    0.00    0.00    0.00     0  |__java

In the result, all the CPU usage is 0, which doesn’t seem right. Think about it, did we miss any clues? That’s right, at this point, the container has finished starting up, and without any client requests, Tomcat itself doesn’t need to do anything, so the CPU usage is naturally 0.

In order to analyze the problems during the startup process, we need to restart the container again. Continue in terminal 1, press Ctrl+C to stop the pidstat command; then execute the following command to restart the container. After successfully restarting, obtain the new PID and rerun the pidstat command. # Remove the old container $ docker rm -f tomcat # Run the new container $ docker run –name tomcat –cpus 0.1 -m 512M -e JAVA_OPTS=’-Xmx512m -Xms512m’ -p 8080:8080 -itd feisky/tomcat:8 # Find the Pid of the new container’s process $ PID=$(docker inspect tomcat -f ‘{{.State.Pid}}’) # Execute pidstat $ pidstat -t -p $PID 1 12:59:28 UID TGID TID %usr %system %guest %wait %CPU CPU Command 12:59:29 0 29850 - 10.00 0.00 0.00 0.00 10.00 0 java 12:59:29 0 - 29850 0.00 0.00 0.00 0.00 0.00 0 |__java 12:59:29 0 - 29897 5.00 1.00 0.00 86.00 6.00 1 |__java … 12:59:29 0 - 29905 3.00 0.00 0.00 97.00 3.00 0 |__java 12:59:29 0 - 29906 2.00 0.00 0.00 49.00 2.00 1 |__java 12:59:29 0 - 29908 0.00 0.00 0.00 45.00 0.00 0 |__java

The output above shows that although the CPU usage (%CPU) is very low, the waiting usage (%wait) is very high, reaching up to 97%. This indicates that most of the threads are waiting for scheduling rather than running.

> Note: If you can't see the %wait metric, please upgrade sysstat first and try again.

Why is the CPU usage so low and most of the threads are waiting for CPU? This phenomenon is caused by Docker, so naturally, you should think that this is probably due to the limitations set by Docker for the container.

Let's review the container startup command at the beginning of the case. We used --cpus 0.1 to set a limit of 0.1 CPU for the container, which is equivalent to 10% CPU. This also explains why the java process has only 10% CPU usage and spends most of its time waiting.

Once the cause is identified, the final optimization becomes simple: just increase the CPU limit. For example, you can execute the following command to increase the CPU limit to 1; then restart the container and observe the startup logs:

# Remove the old container
$ docker rm -f tomcat
# Run the new container
$ docker run --name tomcat --cpus 1 -m 512M -e JAVA_OPTS='-Xmx512m -Xms512m' -p 8080:8080 -itd feisky/tomcat:8
# View container logs
$ docker logs -f tomcat
...
18-Feb-2019 12:54:02.139 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 2001 ms

Now you can see that Tomcat started up in just 2 seconds, much faster than the previous 22 seconds.

Although we solved this problem by increasing the CPU limit, you may find this method too cumbersome if you encounter similar problems again. Setting resource limits for containers also requires us to pre-assess the performance of the application. Obviously, there is a simpler method, such as directly removing the limit and letting the container run as is.

However, this simple method may lead to more serious problems. Without resource limits, it means that the container can consume the entire system's resources. In this way, if any application has an exception, it could potentially bring down the whole machine.

In fact, this is the most common problem on major container platforms. At the beginning, it may seem convenient not to set any limits, but when the number of containers increases, various abnormal issues often occur. After a thorough investigation, it may turn out that one application's resource usage is too high, causing the entire machine to become unresponsive in the short term. Only by setting resource limits can we ensure that such problems are eliminated.

Summary #

Today, I taught you how to analyze performance issues in containerized applications.

If you run a Java application in a Docker container, be sure to configure the JVM resource options (such as heap memory) in addition to setting container resource limits. Of course, if you can upgrade to Java 10, it can automatically solve similar problems.

When dealing with performance issues in containerized applications, you can still use the various methods we discussed earlier for analysis and troubleshooting. However, there are some differences when analyzing performance in containers, such as the following:

  • Containers use cgroups for resource isolation, so you need to consider the impact of cgroups on application performance during analysis.

  • The container’s file system and network protocol stack are isolated from the host. Although we can analyze the behavior of the container from outside, sometimes it may be more convenient to enter the container’s namespace.

  • Container operation may also depend on other components, such as various network plugins (such as CNI), storage plugins (such as CSI), device plugins (such as GPU), which can make performance analysis of containers more complex. If you need to analyze container performance, don’t forget to consider their impact on performance.

Reflection #

Lastly, I would like to invite you to discuss the container performance issues you have encountered. How did you analyze them? And how did you resolve the root cause of these issues? You can combine my explanations and summarize your own thoughts.

Feel free to discuss it with me in the comments section, or share this article with your colleagues and friends. Let’s practice in real-world scenarios and improve through communication.