32 Challenges Faced in the Container Era the Long Winds Break Waves, Sometimes Hanging Sails Over the Sea Directly

32 Challenges Faced in the Container Era- The Long Winds Break Waves, Sometimes Hanging Sails Over the Sea Directly #

In today’s age, the use of containers is becoming increasingly popular. Cgroups, Docker, Kubernetes, and other projects and technologies are becoming more mature and becoming the cornerstone of many large-scale clusters.

Containers are a sandbox technology that allows for resource scheduling, allocation, and quota limitation, as well as environment isolation for different applications.

The container era brings not only opportunities but also many challenges. It’s an opportunity if you can overcome them, but a pitfall if you can’t.

In a container environment, direct debugging is not easy. Instead, we focus more on collecting and monitoring application performance metrics and building alert mechanisms. This requires the collaboration of architects, developers, testers, and operations personnel.

However, the field of monitoring tools is vast and complex, continuously evolving and iterating. In the early stages of monitoring, only server-related parameters were checked during system release, and these parameters were used as indicators of system operation. The health of monitoring servers is closely related to user experience. The tragedy is that imperfect monitoring leads to many more problems than are actually detected.

Over time, efforts have been made in the areas of log management, warning systems, telemetry, and system reports. Many effective measures have been taken, such as security incidents, effective alerts, and recording resource usage. But the prerequisite is that we need a clear strategy and corresponding tools to trace user access routes. Tools such as Zabbix, Nagios, and Prometheus are widely used in production environments.

The key to performance issues is people, that is, our users. However, the existing tools do not truly provide user experience monitoring. Using these software alone cannot alleviate performance problems. We need to take various measures and work tirelessly with courage and focus.

On one hand, diagnosing and optimizing the problems of web systems is a significant task that requires strict control and a lot of effort.

Of course, successfully implementing these tasks brings enormous returns to the company!

On the other hand, taking the Java field and the de facto standard Spring for example, SpringBoot provides an application metric collector called Micrometer. The official documentation link is: https://micrometer.io/docs.

  • It supports directly reporting data to popular monitoring systems such as Elasticsearch, Datadog, InfluxData, etc.
  • It automatically collects metrics such as maximum delay, average delay, 95th percentile, throughput, memory usage, etc.

In addition, in small-scale clusters, we can also use open-source APM tools like Pinpoint and Skywalking.

Resource Isolation in Container Environments #

Containers are a lightweight implementation method, so their isolation is not as good as virtual machine technology.

For example:

If the physical/host machine has 96 CPU cores and 256GB of physical memory, and the container is limited to 4 cores and 8GB of memory, what are the CPU cores and memory seen by the JVM process inside the container?

Currently, the JVM sees 96 cores and 256GB of memory.

This can cause some problems. Various algorithms based on the number of CPU cores, such as availableProcessors, are affected. For example, the default number of GC threads is set to 96*5/8~=60 if nothing is configured. But since the container restricts the use of only 4 CPU cores, 60 parallel GC threads are competing for 4 machine cores, resulting in serious GC performance issues.

The same logic applies to many thread pool implementations that set the number of concurrent threads based on the number of CPU cores. This can also cause intense resource contention. If the container does not restrict resource usage, it can also cause some troubles, such as the bad-neighbor effect described below. Algorithms based on configuration information such as totalPhysicalMemorySize and freePhysicalMemorySize also result in strange bugs.

The latest version of JDK has introduced some fixes.

JDK Support and Limitations for Containers #

The latest version of JDK supports CPU and memory limits for Docker containers:

https://blogs.oracle.com/java-platform-group/java-se-support-for-docker-cpu-and-memory-limits

JVM startup parameters can be added to read the CPU limit set by Cgroups:

https://www.oracle.com/technetwork/java/javase/8u191-relnotes-5032181.html#JDK-8146115 Hotspot is a standardized open-source project that focuses on new features of JDK. You can read the official mailing list subscription, for example:

https://mail.openjdk.java.net/pipermail/jdk8u-dev/

For other versions of JDK features, you can find them in the Mailing Lists on the official website, following a similar naming convention:

https://mail.openjdk.java.net/mailman/listinfo

Please refer to the previous chapter, JVM Troubleshooting and Performance Analysis Experience, for troubleshooting and analysis related to this issue.

Noisy Neighbor Effect #

Where there are shared resources, there will be resource contention. In the field of computer science, shared resources mainly include:

  • Network
  • Disk
  • CPU
  • Memory

In multi-tenant public cloud environments, there is a severe problem called the “Noisy Neighbor Effect”. When one or more customers excessively use a certain shared resource, it significantly impairs the system performance of other customers. (Similar to broadband in a residential area)

Noisy Neighbor refers to behavior in the cloud computing field that describes the preemption of shared bandwidth, disk I/O, CPU, and other resources.

The Noisy Neighbor Effect has an impact or causes performance jitter on other virtual machines/applications in the same environment. Generally, it has a detrimental effect on the performance and experience of other users.

Cloud computing is a multi-tenant environment where the same physical machine is shared among multiple customers to run programs or store data.

The Noisy Neighbor Effect is caused when a virtual machine/application occupies most of the resources and thereby affects the performance of other customers.

Insufficient bandwidth is the main cause of network performance issues. The transmission of data in a network heavily relies on the size of the bandwidth. If an application or instance occupies too much network resources, it is likely to cause delays/slowness for other users. Noisy neighbors affect virtual machines, databases, networks, storage, and other cloud services.

One way to avoid the Noisy Neighbor Effect is to use bare-metal cloud. Bare-metal cloud runs an application directly on the hardware, creating a single-tenant environment, which eliminates noisy neighbors. Although a single-tenant environment avoids the Noisy Neighbor Effect, it does not solve the fundamental problem. Over-commitment or sharing with too many tenants will limit the performance of the entire cloud environment.

Another way to avoid the Noisy Neighbor Effect is to perform dynamic migration between physical machines to ensure that each customer receives the necessary resources. In addition, limiting the Noisy Neighbor Effect can also be achieved by controlling the IOPS (input/output operations per second) of each virtual machine through QoS (quality of service). By limiting the amount of resources each virtual machine uses through IOPS, it will not cause one customer’s virtual machine/application/instance to squeeze out the resources/performance of other customers.

Interested readers can refer to:

Talk about the Noisy Neighbor Effect in Public Cloud

GC Log Monitoring #

Starting from JDK 7, each garbage collector provides a notification mechanism. By listening to the GarbageCollectorMXBean in the program, you can receive detailed information about GC events after garbage collection is completed. The current monitoring mechanism only provides Pause data after GC completion, while other aspects of GC cannot be observed.

Here is a simple implementation of a monitoring program:

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.sun.management.GarbageCollectionNotificationInfo;
import com.sun.management.GcInfo;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.annotation.Configuration;
import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;
import javax.management.ListenerNotFoundException;
import javax.management.Notification;
import javax.management.NotificationEmitter;
import javax.management.NotificationListener;
import javax.management.openmbean.CompositeData;
import java.lang.management.*;
import java.util.*;
import java.util.concurrent.CopyOnWriteArrayList;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicLong;

/**
 * GC Log Monitor and Output to Log
 * JVM Startup Parameters Example:
 * -Xmx4g -Xms4g -XX:+UseG1GC -XX:MaxGCPauseMillis=50
 * -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps
 */
@Configuration
public class BindGCNotifyConfig {

    public BindGCNotifyConfig() {
    }

    //
    private Logger logger = LoggerFactory.getLogger(this.getClass());
    private final AtomicBoolean inited = new AtomicBoolean(Boolean.FALSE);
    private final List<Runnable> notifyCleanTasks = new CopyOnWriteArrayList<Runnable>();
    private final AtomicLong maxPauseMillis = new AtomicLong(0L);
    private final AtomicLong maxOldSize = new AtomicLong(getOldGen().getUsage().getMax());
    private final AtomicLong youngGenSizeAfter = new AtomicLong(0L);

    @PostConstruct
    public void init() {
        try {
            doInit();
        } catch (Throwable e) {
            logger.warn("[GC Log Monitor - Initialization] failed! ", e);
        }
    }

    @PreDestroy
    public void close() {
        for (Runnable task : notifyCleanTasks) {
            task.run();
        }
        notifyCleanTasks.clear();
    }

    private void doInit() {
        //
        if (!inited.compareAndSet(Boolean.FALSE, Boolean.TRUE)) {
            return;
        }
        logger.info("[GC Log Monitor - Initialization] maxOldSize=" + mb(maxOldSize.longValue()));

        // Register a listener for each mbean
        for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) {
            if (!(mbean instanceof NotificationEmitter)) {
                continue;
            }
            final NotificationEmitter notificationEmitter = (NotificationEmitter) mbean;
            // Add listener
            final NotificationListener notificationListener = getNewListener(mbean);
            notificationEmitter.addNotificationListener(notificationListener, null, null);

            logger.info("[GC Log Monitor - Initialization] MemoryPoolNames=" + JSON.toJSONString(mbean.getMemoryPoolNames()));
            // Add to clean queue
            notifyCleanTasks.add(new Runnable() {
                @Override
                public void run() {
                    try {
                        // Clean up the bound listener
                        notificationEmitter.removeNotificationListener(notificationListener);
                    } catch (ListenerNotFoundException e) {
                        logger.error("[GC Log Monitor - Cleanup] Failed to clean up the bound listener", e);
                    }
                }
            });
        }
    }

    private NotificationListener getNewListener(final GarbageCollectorMXBean mbean) {
        //
        final NotificationListener listener = new NotificationListener() {
            @Override
            public void handleNotification(Notification notification, Object ref) {
                // Only process GC events
                if (!notification.getType().equals(GarbageCollectionNotificationInfo.GARBAGE_COLLECTION_NOTIFICATION)) {
                    return;
                }
                CompositeData cd = (CompositeData) notification.getUserData();
                GarbageCollectionNotificationInfo notificationInfo = GarbageCollectionNotificationInfo.from(cd);
                //
                JSONObject gcDetail = new JSONObject();

                String gcName = notificationInfo.getGcName();
                String gcAction = notificationInfo.getGcAction();
                String gcCause = notificationInfo.getGcCause();
                GcInfo gcInfo = notificationInfo.getGcInfo();
                // duration refers to the total pause time in the Pause phase, and there will be no notification for concurrent phase without pause.
                long duration = gcInfo.getDuration();
                if (maxPauseMillis.longValue() < duration) {
                    maxPauseMillis.set(duration);
                }
                long gcId = gcInfo.getId();
                //
                String type = "jvm.gc.pause";
                //
                if (isConcurrentPhase(gcCause)) {
                    type = "jvm.gc.concurrent.phase.time";
            }
            //
            gcDetail.put("gcName", gcName);
            gcDetail.put("gcAction", gcAction);
            gcDetail.put("gcCause", gcCause);
            gcDetail.put("gcId", gcId);
            gcDetail.put("duration", duration);
            gcDetail.put("maxPauseMillis", maxPauseMillis);
            gcDetail.put("type", type);
            gcDetail.put("collectionCount", mbean.getCollectionCount());
            gcDetail.put("collectionTime", mbean.getCollectionTime());

            // Number of live data
            AtomicLong liveDataSize = new AtomicLong(0L);
            // Number of promoted bytes
            AtomicLong promotedBytes = new AtomicLong(0L);

            // Update promotion and allocation counters
            final Map<String, MemoryUsage> before = gcInfo.getMemoryUsageBeforeGc();
            final Map<String, MemoryUsage> after = gcInfo.getMemoryUsageAfterGc();

            //
            Set<String> keySet = new HashSet<String>();
            keySet.addAll(before.keySet());
            keySet.addAll(after.keySet());
            //
            final Map<String, String> afterUsage = new HashMap<String, String>();
            //
            for (String key : keySet) {
                final long usedBefore = before.get(key).getUsed();
                final long usedAfter = after.get(key).getUsed();
                long delta = usedAfter - usedBefore;
                // Determine if it is young or old generation, different algorithm
                if (isYoungGenPool(key)) {
                    delta = usedBefore - youngGenSizeAfter.get();
                    youngGenSizeAfter.set(usedAfter);
                } else if (isOldGenPool(key)) {
                    if (delta > 0L) {
                        // Number of bytes promoted to old generation
                        promotedBytes.addAndGet(delta);
                        gcDetail.put("promotedBytes", mb(promotedBytes));
                    }
                    if (delta < 0L || GcGenerationAge.OLD.contains(gcName)) {
                        liveDataSize.set(usedAfter);
                        gcDetail.put("liveDataSize", mb(liveDataSize));
                        // Check if the max size of old generation has changed
                        final long oldMaxAfter = after.get(key).getMax();
                        if (maxOldSize.longValue() != oldMaxAfter) {
                            maxOldSize.set(oldMaxAfter);
                            // Resize; max size of old generation has changed
                            gcDetail.put("maxOldSize", mb(maxOldSize));
                        }
                    }
                } else if (delta > 0L) {
                } else if (delta < 0L) {
                    // Check if it's G1
                }
                afterUsage.put(key, mb(usedAfter));
            }
            //
            gcDetail.put("afterUsage", afterUsage);
            //

            logger.info("[GC Log Listener - GC Event] gcId={}; duration:{}; gcDetail: {}", gcId, duration, gcDetail.toJSONString());
        }
    };

    return listener;
}

private static String mb(Number num) {
    long mbValue = num.longValue() / (1024 * 1024);
    if (mbValue < 1) {
        return "" + mbValue;
    }
    return mbValue + "MB";
}

private static MemoryPoolMXBean getOldGen() {
    List<MemoryPoolMXBean> list = ManagementFactory
            .getPlatformMXBeans(MemoryPoolMXBean.class);
    //
    for (MemoryPoolMXBean memoryPoolMXBean : list) {
        // Check if it's not a non-heap section
        if (!isHeap(memoryPoolMXBean)) {
            continue;
        }
        if (!isOldGenPool(memoryPoolMXBean.getName())) {
            continue;
        }
        return (memoryPoolMXBean);
    }
    return null;
}

private static boolean isConcurrentPhase(String cause) {
    return "No GC".equals(cause);
}

private static boolean isYoungGenPool(String name) {
    return name.endsWith("Eden Space");
}

private static boolean isOldGenPool(String name) {
    return name.endsWith("Old Gen") || name.endsWith("Tenured Gen");
}

private static boolean isHeap(MemoryPoolMXBean memoryPoolBean) {
    return MemoryType.HEAP.equals(memoryPoolBean.getType());
}

private enum GcGenerationAge {
    OLD,
    YOUNG,
    UNKNOWN;

    private static Map<String, GcGenerationAge> knownCollectors = new HashMap<String, BindGCNotifyConfig.GcGenerationAge>() {{
        put("ConcurrentMarkSweep", OLD);
        put("Copy", YOUNG);
        put("G1 Old Generation", OLD);
        put("G1 Young Generation", YOUNG);
        put("MarkSweepCompact", OLD);
        put("PS MarkSweep", OLD);
        put("PS Scavenge", YOUNG);
        put("ParNew", YOUNG);
    }};

    static GcGenerationAge fromName(String name) {
        return knownCollectors.getOrDefault(name, UNKNOWN);
    }

    public boolean contains(String name) {
        return this == fromName(name);
    }
}

}

Not only GC events, but also memory-related information can be monitored through JMX. Many APM tools also use similar methods for data reporting.

### APM Tools and Monitoring Systems

Online visual monitoring is an essential feature in today's production environments. Business failures and performance issues can occur at any time, and many systems no longer have fixed business windows, so real-time monitoring must be available 24/7.

Currently, there are many monitoring tools in the industry, each with its own advantages and disadvantages, and they need to be selected according to specific needs.

Generally, system monitoring can be divided into three parts:

  * System performance monitoring, including CPU, memory, disk IO, network, and other hardware resource and system load monitoring information.
  * Business log monitoring, typically using ELK stack and technologies such as Logback+Kafka for log collection.
  * APM performance metric monitoring, such as QPS, TPS, response time, etc., for example, MicroMeter, Pinpoint, etc.

The system monitoring module also consists of two main parts:

  * Metric collection part
  * Data visualization system

Monitoring tools have become an important part of production environments. Visualization of measurement results, error tracking, performance monitoring, and application analysis are fundamental means of observing the operating status of applications.

It is very easy to recognize this need, but it is extremely difficult to choose a monitoring tool or a combination of monitoring tools.

Below are a few monitoring tools, including a mix of open source and SaaS models. Each one has its own pros and cons, and it can be said that there is no perfect tool, only a suitable tool.
#### **Metric Collection Client**

* Micrometer: As a basic library for metric collection, Micrometer is based on the client machine and users do not need to pay attention to specific JVM versions and vendors. It can be configured in the same way to connect to different visualization monitoring systems. It is mainly used for monitoring, alerting, and responding to changes in the current system environment. Micrometer also registers JMX-related MBeans, making it simple and convenient to view relevant metrics locally through JMX. If used in a production environment, monitoring metrics are generally exported to other monitoring systems for storage.
* Cloud service monitoring systems: Cloud service monitoring system vendors generally provide corresponding metric collection clients and open various API interfaces and data standards for customers to use their metric collection systems.
* Open-source monitoring systems: Various open-source monitoring systems also provide corresponding metric collection clients.

#### **Cloud Service Monitoring Systems**

Monitoring systems for SaaS services generally provide integrated cloud services for storage, querying, visualization, and other functions. Most of them have both free trial and paid service modes. If conditions allow, paid use of cloud services is generally the best choice, after all, "free things are the most expensive".

Let's take a look at some cloud service monitoring systems:

* [AppOptics](https://www.appoptics.com/): A SaaS service that supports APM and system monitoring, providing various monitoring interfaces such as dashboards and timelines. It offers APIs and clients.
* [Datadog](https://www.datadoghq.com/): A SaaS service that supports APM and system monitoring, with built-in dashboards and support for alerts. It supports APIs, clients, and client agents.
* [Dynatrace](https://www.dynatrace.com/): A SaaS service that supports APM and system monitoring, with built-in dashboards and integrated monitoring and analysis platforms.
* [Humio](https://www.humio.com/): A SaaS service that supports APM, log, and system monitoring.
* [Instana](https://www.instana.com/): A SaaS service that supports automatic APM and system monitoring.
* [New Relic](https://newrelic.com/): A visual SaaS product with a complete UI, supporting NRQL query language and running based on a push model with New Relic Insights.
* [SignalFx](https://www.signalfx.com/): A SaaS service that runs on a push model, with a complete UI. It supports real-time monitoring of system performance, microservices, and APM monitoring systems, and supports various "detectors" for alerts.
* [Stackdriver](https://cloud.google.com/stackdriver?hl=en): An embedded monitoring suite for Google Cloud that monitors the performance of cloud infrastructure, software, and applications, troubleshoots and improves them. This monitoring suite belongs to a SaaS service, supporting built-in dashboards and alert functions.
* [Wavefront](https://www.wavefront.com/): A SaaS-based metric monitoring and analysis platform that supports visual queries, as well as alert monitoring and other functions, including system performance, network, custom metrics, and business KPIs, etc.
* [Tingyun](https://www.tingyun.com/): The largest APM solution provider in China. It can achieve comprehensive visualization of application performance, from the PC end, browser end, mobile client to the server, monitoring and locating complex performance problems such as crashes, freezes, slow interactions, failed third-party API calls, declining database performance, poor CDN quality, etc.
* [OneAPM](https://www.oneapm.com/index.html): OneAPM provides end-to-end APM application performance management software and application performance monitoring software solutions.
* [Plumbr](https://plumbr.io/): It monitors availability and performance issues using tracing technology, quickly pinpoints error-related location information, and discovers, validates, and fixes various faults and performance problems.
* [Takipi](https://www.overops.com/): Now renamed as OverOps, it is a real-time monitoring system for system failures. It quickly identifies when, where, and why problems occur.
Among them, the better ones are Datadog from overseas, and Yingyun from China.

#### **Open Source Monitoring Systems**

  * [Pinpoint](https://github.com/naver/pinpoint), inspired by Dapper, is a large-scale distributed system APM tool implemented in Java/PHP. Pinpoint provides a solution to quickly locate the call chain by tracking transactions between distributed applications.
  * [Atlas](https://github.com/Netflix/atlas), an open-source, in-memory time series database under Netflix, has a built-in graphical interface and supports advanced mathematical operations and custom query languages.
  * [ELK Stack](http://www.elastic.co/), generally used for log monitoring. [Elasticsearch](http://www.elastic.co/) is the search engine that supports various data and metric storage. Log analysis is usually performed by [Logstash](http://www.elastic.co/products/logstash), while [Kibana](http://www.elastic.co/products/kibana) is responsible for user interaction and visualization.
  * [Influx](https://www.influxdata.com/), InfluxDB is an open-source time series database developed by InfluxData. It is written in Go and focuses on querying and storing time series data with high performance. InfluxDB is widely used in scenarios such as storing monitoring data of storage systems and real-time data in the IoT industry. It uses a SQL-like query language for data analysis. The InfluxData tool suite can be used for real-time stream processing, support for sampling metrics, automatic expiration, deletion of unnecessary data, as well as backup and restore functions.
  * [Ganglia](http://ganglia.sourceforge.net/), a scalable distributed monitoring tool for high-performance computing systems, clusters, and networks. It originated from the University of California, Berkeley, and is a long-standing multi-level metric monitoring system widely popular in Linux systems.
  * [Graphite](https://graphiteapp.org/), a currently popular multi-level metric monitoring system that uses a fixed number of underlying databases. Its design and purpose are similar to RRD. It was created by Orbitz in 2006 and open-sourced in 2008.
  * [KairosDB](https://kairosdb.github.io/), a time series database built on [Apache Cassandra](http://cassandra.apache.org/). Beautiful monitoring charts can be drawn using [Grafana](https://grafana.com/).
  * [Prometheus](https://prometheus.io/), an open-source, in-memory time series database with a simple built-in UI, support for custom query languages and mathematical operations. Prometheus is designed to operate based on a pull model, periodically collecting metrics from application instances based on service discovery.
  * [StatsD](https://github.com/statsd/statsd), an open-source, simple but powerful statistics aggregation server.

Pinpoint and Prometheus are particularly popular.

### Reference Links

  * [Monitoring GC with JMX Notifications](http://blog.lichengwu.cn/java/2013/09/15/listen-gc-using-jmx-notification/)
  * [7 Recommended Monitoring Tools](https://www.oschina.net/translate/7-monitoring-tools-to-prevent-the-next-doomsday)