08 Slow Service Responses or Downtime How to Deal With Quick Failure and Service Degradation

08 Slow Service Responses or Downtime - How to Deal with Quick Failure and Service Degradation #

In the previous chapter, we completed the inter-service calls using OpenFeign and effectively handled concurrent calls in a multi-instance cluster by adjusting the load balancing strategy. However, in network product development, there may be times when the network or services are not available. When the service response is slow or unavailable, a large number of requests pile up, which can become the last straw that breaks the system’s back. How do we deal with such situations? This chapter will introduce you to the Hystrix component.

What is Hystrix #

Hystrix is a basic component provided by distributed systems that offer low-latency fault tolerance. It provides multiple dimensions of protection for microservices, including flow control, service degradation, system circuit breaker protection, and quick failure. Hystrix is also part of the Netflix suite.

Unfortunately, starting from version 1.5.18, Hystrix has entered maintenance mode. The official replacement solution is resilience4j. The ultimate version of Hystrix, which is used in this test, is recommended for higher versions. However, it is still recommended to use resilience4j. In the future, another important component called Sentinel will replace Hystrix.

Introduction of Hystrix #

Using the starter to introduce it:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>

The feignClient already integrates the circuit breaker feature by default, but it needs to be enabled in the configuration file to take effect. Open the Hystrix switch in application.properties file:

# hystrix enable
feign.hystrix.enabled=true

Go back to the previous FeignClient code and add the fallback attribute in the annotation, which requires adding the corresponding fallback invocation class.

@FeignClient(value = "card-service", fallback = MemberCardServiceFallback.class)
public interface MemberCardClient {

    @RequestMapping(value = "/card/addCard", method = RequestMethod.POST)
    public CommonResult<Integer> addCard(@RequestParam(value = "json") String json) throws BusinessException;

    @RequestMapping(value = "/card/updateCard", method = RequestMethod.POST)
    public CommonResult<Integer> updateCard(@RequestParam(value = "json") String json) throws BusinessException;
}

Write the MemberCardServiceFallback method, which is an ordinary service implementation class with the [@Component] annotation added.

@Component
@Slf4j
public class MemberCardServiceFallback implements MemberCardClient {

    @Override
    public CommonResult<Integer> addCard(String json) throws BusinessException {
        CommonResult<Integer> result = new CommonResult<>("parking-card service not available! ");
        log.warn("parking-card service not available! ");
        return result;
    }

    @Override
    public CommonResult<Integer> updateCard(String json) throws BusinessException {
        CommonResult<Integer> result = new CommonResult<>("parking-card service not available! ");
        log.warn("parking-card service not available! ");
        return result;
    }

}

Test Hystrix #

In the previous chapter, the functionality was completed according to the normal workflow: after the member is registered, points are generated. Now, we will not start the “points sub-service” to see the effect (the default service registry is already started, so it will not be specifically mentioned in the demonstration process here).

Start only one sub-service, parking-member.
Open the swagger-ui interface of the parking-member sub-service and call the member’s binding phone number interface (or use the PostMan tool)

Under normal circumstances, the fallback interface will be called directly, resulting in a quick failure and response to the calling party.

Start the points module service at this time and make another call. Under normal circumstances, the fallback method will no longer be called, and the points service interface will be called normally, as shown in the following figure:

Graphical monitoring of Hystrix #

Through the above application, we have successfully integrated Hystrix into the development process, but what is the real-time running status of Hystrix? Is there any way to view the various indicators of Hystrix? Here, we introduce Hystrix Dashboard, a UI-based tool for quickly viewing the running status.

Add a new dashboard project #

Under the parking-base-serv project, create a new Spring Boot sub-project named parking-hystrix-dashboard specifically for monitoring the Hystrix dashboard. Modify the pom.xml file and add the relevant dependencies.

```xml
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-hystrix-dashboard</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Add @EnableHystrixDashboard annotation to the startup class to enable the dashboard functionality.

@SpringBootApplication
@EnableDiscoveryClient
@EnableHystrixDashboard
public class ParkingHystrixDashboardApplication {

    public static void main(String[] args) {
        SpringApplication.run(ParkingHystrixDashboardApplication.class, args);
    }

}

Start the project and open the address: http://localhost:10093/hystrix. If the page appears as shown in the image below, it means that it is running normally.

Adjust the monitored project #

In the member service, when calling the points service interface, a remote call is made using Feign, which also implements fallback service degradation and fast failure functionality. The main target to monitor for this time is this functionality.

In the config package of the parking-member project, add the configuration for Hystrix data stream:

@Configuration
public class HystrixConfig {

    @Bean
    public ServletRegistrationBean<HystrixMetricsStreamServlet> getServlet() {
        HystrixMetricsStreamServlet servlet = new HystrixMetricsStreamServlet();
        ServletRegistrationBean<HystrixMetricsStreamServlet> bean = new ServletRegistrationBean<>(servlet);
        bean.addUrlMappings("/hystrix.stream");
        bean.setName("HystrixMetricsStreamServlet");
        return bean;
    }
}

After starting, open the Hystrix data acquisition address of this project: http://localhost:10060/hystrix.stream. In the initial state, the page will continuously output empty pings. Only when the related functionality is requested using Hystrix, the JSON format data can be output normally, as shown in the screenshot below:

The result output in the image above is not very friendly and there is no way to intuitively analyze the application status of the Hystrix component. This is where our dashboard project comes in handy.

Dashboard interpretation #

Enter the address http://localhost:10060/hystrix.stream into the data retrieval address bar of the dashboard page. The Delay field can use the default value, and the Title field can be given a new name for us to recognize. Similarly, the related functionality will only be displayed in the dashboard if it has been executed. The image below shows that all requests failed due to the points service not being started while the member service directly called it.

Here is a simple interpretation of the charts:

The circle in the top left corner represents the health of the service, decreasing from green, yellow, orange to red.
The curve is used to record the relative changes in traffic within 2 minutes, observing the upward and downward trends of the traffic.
The numbers in the left box correspond one-to-one with the numbers in the top right.
Host and Cluster record the request frequency of the service.
The *th tags below represent the delay for the percentiles.

(After restoring the points service and high-frequency re-calling the functionality, it is found that the requests are normal and the circle becomes larger)

In this case, only one application status of Hystrix is written. If multiple services are used in the service, the presentation of the dashboard will be more comprehensive. From the page, you can monitor the service’s stress and operation status clearly, providing an important reference for operations.

(The parameter display in the image is slightly different, the image is from https://github.com/Netflix-Skunkworks/hystrix-dashboard)

Through the learning and practice in the previous article, I believe you have a preliminary concept of the application of Hystrix circuit breaker and how to apply it to the project, providing escort and protection for our service.

Leave a question for thought #

In the article, only the application of the circuit breaker of one module service is shown. What if multiple services need to be monitored? Do you open multiple dashboard pages at the same time? Do you have any better solutions?