20 Spring Boot Service Performance Optimization

20 SpringBoot Service Performance Optimization #

Before starting the performance optimization of SpringBoot service, you need to prepare and expose some data of the SpringBoot service. For example, if your service uses caching, you need to collect data such as cache hit rate; if it uses a database connection pool, you need to expose the parameters of the connection pool.

The monitoring tool we use here is Prometheus, which is a time series database that can store our metrics. SpringBoot can be easily integrated with Prometheus.

How to enable monitoring in SpringBoot? #

After creating a SpringBoot project, first add the Maven dependencies.

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-core</artifactId>
</dependency>

Then, we need to open the relevant monitoring endpoints in the application.properties configuration file.

management.endpoint.metrics.enabled=true
management.endpoints.web.exposure.include=*
management.endpoint.prometheus.enabled=true
management.metrics.export.prometheus.enabled=true

After starting, we can access the monitoring endpoint to get the monitoring data by visiting monitoring endpoint.

Drawing 0.png

Monitoring business data is also relatively simple. You just need to inject a MeterRegistry instance. Here is an example code:

@Autowired
MeterRegistry registry;

@GetMapping("/test")
@ResponseBody
public String test() {
    registry.counter("test",
            "from", "127.0.0.1",
            "method", "test"
    ).increment();

    return "ok";
}

From the monitoring endpoint, we can find the monitoring information that was just added.

test_total{from="127.0.0.1",method="test",} 5.0

Let’s briefly introduce the popular Prometheus monitoring system. Prometheus uses a pull approach to obtain monitoring data, and this process of exposing data can be handed over to a more powerful component called Telegraf.

Drawing 1.png

As shown in the above figure, we usually use Grafana to display monitoring data and use AlertManager component to provide early warning. Setting up this part is not our focus, interested students can study it on your own.

The figure below shows a typical monitoring graph, which shows the cache hit rate of Redis, etc.

Drawing 2.png

Generating Flame Graph in Java #

A Flame Graph is a tool used to analyze program performance bottlenecks.

Flame graphs can also be used to analyze Java applications. You can download the compressed package of async-profiler from GitHub for related operations. For example, we extract it to the /root/ directory, and then start Java application with the javaagent as follows:

java -agentpath:/root/build/libasyncProfiler.so=start,svg,file=profile.svg -jar spring-petclinic-2.3.1.BUILD-SNAPSHOT.jar

After running for a period of time, stop the process. You can see that a profile.svg file is generated in the current directory, which can be opened with a browser. As shown in the following figure, the vertical direction represents the depth of the call stack, and the horizontal direction represents the time consumed. The wider the blocks are, the more likely they are bottlenecks. By browsing layer by layer, you can find the targets to be optimized.

2020-08-21 17-07-35.2020-08-21 17_12_29.gif

Optimization Approach #

For an ordinary web service, let’s take a look at the main steps to access specific data.

As shown in the figure below, when entering the corresponding domain name in the browser, it needs to resolve to the specific IP address through DNS. In order to ensure high availability, our service is generally deployed in multiple instances, and Nginx is used for reverse proxying and load balancing.

Drawing 4.png

Nginx will take part of the responsibilities for dynamic/static separation according to the nature of resources. Among them, the dynamic part will enter our SpringBoot service.

HTTP Optimization #

Let’s take a look at some actions that can speed up web page loading. For convenience, we will only discuss the HTTP 1.1 protocol.

1. Use CDN for file delivery acceleration

For large files, it is recommended to use a Content Delivery Network (CDN) for distribution. Even commonly used front-end scripts, stylesheets, and images can be placed on a CDN. CDN usually accelerates the retrieval of these files, resulting in faster web page loading.

2. Set Cache-Control values appropriately

Browsers use the content of the HTTP header Cache-Control to determine whether to use browser caching. This is very useful when managing static files. The same effect can be achieved with the Expires header. Cache-Control indicates when the resource will expire, while Expires indicates the expiration date.

This parameter can be set in the Nginx configuration file.

location ~* ^.+\.(ico|gif|jpg|jpeg|png)$ { 
    # Cache for 1 year
    add_header Cache-Control: no-cache, max-age=31536000;
}

3. Reduce the number of domain names for single page requests

Reduce the number of domain names used in each page request to keep it within 4. This is because every time the browser accesses a backend resource, it needs to query DNS, find the corresponding IP address, and then make the actual call.

DNS has multiple layers of caching, with the browser caching a copy, the local host caching, the ISP caching, etc. The process of DNS to IP address translation usually takes 20-120ms. By reducing the number of domain names, resource retrieval can be accelerated.

4. Enable gzip compression

Enabling gzip compression allows content to be compressed and then decompressed by the browser. Since the transmitted size is reduced, bandwidth usage is reduced, resulting in improved transfer efficiency.

Enabling gzip in Nginx is easy, with the following configuration:

gzip on;
gzip_min_length 1k;
gzip_buffers 4 16k;
gzip_comp_level 6;
gzip_http_version 1.1;
gzip_types text/plain application/javascript text/css;

5. Compress resources

Compress JavaScript, CSS, and even HTML. The principle is similar. In modern web development, when using the popular frontend-backend separation pattern, these resources are usually compressed.

6. Use keepalive

Creating and closing connections consume resources. After users access our services, there will be more interactions in the future. Therefore, keeping long connections can significantly reduce network interactions and improve performance.

Nginx has keepalive support for clients enabled by default. You can adjust its behavior using the following two parameters:

http {
    keepalive_timeout 120s;
    keepalive_requests 10000;
}

Manually enabling long connections between Nginx and the backend upstream can be done with the following configuration:

location ~ /{ 
    proxy_pass http://backend;
    proxy_http_version 1.1;
    proxy_set_header Connection "";
}

Custom Web Container #

If your project has a high concurrent load and you want to modify configuration information such as the maximum number of threads and connections, you can customize the web container using the following code:

@SpringBootApplication(proxyBeanMethods = false)
public class App implements WebServerFactoryCustomizer<ConfigurableServletWebServerFactory> {
    public static void main(String[] args) {
        SpringApplication.run(PetClinicApplication.class, args);
    }
    @Override
    public void customize(ConfigurableServletWebServerFactory factory) {
        TomcatServletWebServerFactory f = (TomcatServletWebServerFactory) factory;
        f.setProtocol("org.apache.coyote.http11.Http11Nio2Protocol");

        f.addConnectorCustomizers(c -> {
            Http11NioProtocol protocol = (Http11NioProtocol) c.getProtocolHandler();
            protocol.setMaxConnections(200);
            protocol.setMaxThreads(200);
            protocol.setSelectorTimeout(3000);
            protocol.setSessionTimeout(3000);
            protocol.setConnectionTimeout(3000);
        });
    }
}

}
}

Note the code above. We set its protocol to org.apache.coyote.http11.Http11Nio2Protocol, which means Nio2 is enabled. This parameter is only available in Tomcat 8.0 and later versions, and enabling it will improve performance. The comparison is as follows (see the test project code at spring-petclinic-main):

Default:

[root@localhost wrk2-master]# ./wrk -t2 -c100 -d30s -R2000 http://172.16.1.57:8080/owners?lastName=
Running 30s test @ http://172.16.1.57:8080/owners?lastName=
  2 threads and 100 connections
  Thread calibration: mean lat.: 4588.131ms, rate sampling interval: 16277ms
  Thread calibration: mean lat.: 4647.927ms, rate sampling interval: 16285ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    16.49s     4.98s   27.34s    63.90%
    Req/Sec   106.50      1.50   108.00    100.00%
  6471 requests in 30.03s, 39.31MB read
  Socket errors: connect 0, read 0, write 0, timeout 60
Requests/sec:    215.51
Transfer/sec:      1.31MB

Nio2:

[root@localhost wrk2-master]# ./wrk -t2 -c100 -d30s -R2000 http://172.16.1.57:8080/owners?lastName=
Running 30s test @ http://172.16.1.57:8080/owners?lastName=
  2 threads and 100 connections
  Thread calibration: mean lat.: 4358.805ms, rate sampling interval: 15835ms
  Thread calibration: mean lat.: 4622.087ms, rate sampling interval: 16293ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    17.47s     4.98s   26.90s    57.69%
    Req/Sec   125.50      2.50   128.00    100.00%
  7469 requests in 30.04s, 45.38MB read
  Socket errors: connect 0, read 0, write 0, timeout 4
Requests/sec:    248.64
Transfer/sec:      1.51MB

You can even replace Tomcat with Undertow. Undertow is a lighter web container with lower memory usage and fewer daemon processes. To make the change, use the following configuration:

<dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-web</artifactId>
      <exclusions>
        <exclusion>
          <groupId>org.springframework.boot</groupId>
          <artifactId>spring-boot-starter-tomcat</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-undertow</artifactId>
    </dependency>

In fact, the most effective way to optimize Tomcat is to configure the JVM parameters. You can refer to the content of the previous lesson for adjustment. For example, using the following parameters to start, the QPS increases from 248 to 308:

-XX:+UseG1GC -Xmx2048m -Xms2048m -XX:+AlwaysPreTouch

Skywalking #

For a web service, the slowest part is often the database operations. Therefore, by optimizing the local cache and distributed cache as introduced in “07 | Case Study: Ubiquitous Cache, the Magic Weapon of High-Concurrency Systems” and “08 | Case Study: How Does Redis Help with Seckill Business,” you can achieve the maximum performance improvement.

For troubleshooting complex distributed environments, I would like to share another tool: Skywalking.

Skywalking uses probe technology (JavaAgent) to implement it. By adding the Java agent JAR package to the Java startup parameters, performance data and invocation chain data can be encapsulated and sent to the Skywalking server.

Download the corresponding installation package (if using Elasticsearch storage, you need to download a dedicated installation package) and configure the storage. Then, you can start it with one click.

Extract the agent’s compressed package to the corresponding directory:

tar xvf skywalking-agent.tar.gz  -C /opt/

Add the agent package to the business startup parameters. For example, if the original startup command is:

java  -jar /opt/test-service/spring-boot-demo.jar  --spring.profiles.active=dev

The modified startup command becomes:

java -javaagent:/opt/skywalking-agent/skywalking-agent.jar -Dskywalking.agent.service_name=the-demo-name  -jar /opt/test-service/spring-boot-demo.ja  --spring.profiles.active=dev

Visit some service links and open the Skywalking UI. You will see the following interface. These indicators can be understood in the same way as the performance measurement indicators mentioned in “01 | Theoretical Analysis: What are the Metrics for Performance Optimization? What Should We Pay Attention To?” We can find interfaces with slow response and high QPS from the graph for targeted optimization. Drawing 5.png

Optimization directions for each layer #

1. Controller layer #

The controller layer is used to receive query parameters from the front-end and construct query results. Many projects now use a front-end and back-end separation architecture, so the methods in the controller layer generally use the @ResponseBody annotation to parse the query results into JSON data and return them (balancing efficiency and readability).

Since the controller only serves as a role similar to function composition and routing, its impact on performance is mainly reflected in the size of the dataset. If the result set is very large, the JSON parsing component will take more time to parse it, resulting in not only parsing time but also memory waste.

For example, if the memory occupied by the result set before it is parsed into JSON is 10MB, then during the parsing process, it may use 20MB or more memory to do this work.

I have seen many cases where excessive nesting of returned objects and referencing of objects that should not be referenced (such as very large byte[] objects) resulted in a surge in memory usage.

Therefore, for general services, it is necessary to keep the result set concise, which is also the reason why DTO (data transfer object) exists. If the result structure returned by your project is complex, it is necessary to transform the result set.

2. Service layer #

The service layer is used to handle specific business logic, and most functional requirements are completed here. The service layer is generally implemented using the singleton pattern (prototype), rarely maintains state, and can be reused by controllers.

The organization of code in the service layer has a significant impact on code readability and performance. Most of the commonly mentioned design patterns are mostly for the service layer.

The service layer will frequently use lower-level resources to obtain the data we need through composition, and most optimization strategies provided in previous lessons can be applied.

One point that needs to be emphasized is distributed transactions.

Drawing 6.png

As shown in the above figure, four operations are distributed across three different resources. In order to achieve consistency, three different resources, MySQL, MQ, and ElasticSearch, need to be coordinated. The underlying protocol and implementation methods of these resources are different, so they cannot be solved by the Transaction annotation provided by Spring. External components need to be used.

Many people have experienced that when some code guaranteeing consistency is added, the performance drops significantly during load testing. Distributed transactions are performance killers because they require additional steps to ensure consistency. Common methods include: two-phase commit scheme, TCC, local message table, MQ transaction messages, and distributed transaction middleware.

Drawing 7.png

As shown in the above figure, distributed transactions need to be considered comprehensively in terms of transformation costs, performance, timeliness, etc. There is a term between distributed transactions and non-transactions called flexible transactions. The idea of flexible transactions is to move business logic and exclusive operations from the resource layer to the business layer.

Let’s briefly compare traditional transactions and flexible transactions.

ACID

The biggest feature of relational databases is transaction processing, that is, meeting ACID.

Atomicity: Operations in a transaction must either all be completed or all be rolled back.
Consistency: The system must always be in a strong consistent state.
Isolation: The execution of one transaction should not be interfered with by other transactions.
Durability: Changes made to the data in the database by a committed transaction are permanent.

BASE

The BASE method improves availability and system performance by sacrificing consistency and isolation.

BASE is an acronym for Basically Available, Soft-state, and Eventually Consistent. Each letter in BASE represents:

Basically Available: The system can function and provide services.
Soft-state: The system does not require constant strong consistency.
Eventually Consistent: The system requires consistency at some point in time.

For internet businesses, it is recommended to use compensating transactions to achieve eventual consistency. For example, data can be repaired through a series of scheduled tasks.

3. Dao Layer #

With proper data caching, we try to avoid requests reaching the Dao layer. Unless you are familiar with the cache features provided by the ORM itself, it is recommended to use more general ways to cache data.

The Dao layer primarily focuses on the use of ORM frameworks. For instance, in JPA, if a one-to-many or many-to-many mapping relationship is added without lazy loading enabled, cascading queries can result in deep retrievals, leading to high memory usage and slow execution.

In large-scale businesses, it is common to utilize sharding techniques for databases. In these sharding components, simple query statements are parsed, distributed to different nodes for computation, and finally merged for results.

For example, a simple count query like “select count(*) from a” may route the request to multiple tables for computation, resulting in known execution efficiency. Currently, ShardingJdbc and MyCat are representative middleware for sharding. Although they provide users with consistent views, we must pay attention to these differences while coding.

Summary #

Let’s summarize what we have covered in this lesson.

In this lesson, we briefly explored common optimization strategies for Spring Boot and introduced three new performance analysis tools.

Prometheus, a monitoring system that provides specific metric sizes.
Flame graph, which reveals specific code hotspots.
Skywalking, which analyzes call chains in distributed environments.

Spring Boot’s web container is Tomcat, and we can optimize its performance for performance gains. Additionally, we provided a series of optimization strategies for the Nginx load balancer in the upper layer of the service.

Finally, we examined some optimization directions for the Controller, Service, and Dao layers in the classic MVC architecture, with a particular focus on distributed transactions in the Service layer.

As a widely used service framework, Spring Boot has made significant efforts in performance optimization by adopting high-speed components. For example, the default database connection pool is HikariCP, the default Redis cache framework is Lettuce, and local caching is provided by Caffeine. Caching is the primary means of optimization for a regular web service that interacts with databases.

However, success lies in the details, and the content covered in lessons 05-19 has valuable insights for performance optimization. In the next lesson (the last lesson in this column), I will provide an overall summary of problem identification, goal setting, and optimization methods.