22 How to Ensure a Complete Service Chain Path for a Request Tracking Process #
In a microservices system, a request can invoke multiple services, forming a call chain. Ordinary log outputs cannot link the entire system. When a node in the call process encounters an exception, the difficulty of locating and troubleshooting increases. In such cases, a component is needed to analyze system performance and display the call chain to quickly locate and solve problems. Thus, APM tools are introduced.
What is APM #
The full name is Application Performance Management, which focuses on performance bottleneck analysis of internal execution and inter-service calls. Compared to traditional monitoring software (such as Zabbix), which only provides some scattered monitoring points and metrics, even if it sends alerts, it doesn’t know where the problem is located.
Putting aside commercial tools, there are many open-source products, such as Pinpoint, Zipkin, CAT, SkyWalking, etc. There is a lot of information available on the internet to explain the comparisons between these products. In this practical case, we will use SkyWalking to monitor the system and see the effectiveness of APM.
Why choose SkyWalking? SkyWalking is an outstanding open-source project in China that is more in line with the development habits of Chinese developers and has better compatibility with the Chinese ecosystem. It has already been donated to the Apache organization, further expanding its influence and increasing community activity. In addition, it uses a non-intrusive method of bytecode injection to monitor the system, greatly reducing the code pollution caused by third-party tools.
Installing SkyWalking #
Download link (China mirror):
After extraction, the directory structure will be as follows:
Go to the config
folder and open the main configuration file application.yml
. You can find that SkyWalking supports several cluster configuration methods: ZooKeeper, Nacos, Etcd, Consul, and Kubernates. In this example, we will demonstrate the standalone version. SkyWalking supports three storage methods: H2, MySQL, and ElasticSearch. By default, H2 stores data in memory and loses data after restarting. ElasticSearch is recommended by the official documentation because it provides faster and larger storage capacity.
Installing ElasticSearch #
Download ElasticSearch (official download requires a VPN):
I downloaded the Mac version, but please note the version compatibility issue. In this case, we are using version 6.8.x. If you use ES 7+ version, when integrating with SkyWalking, there may be issues with data writing.
The directory structure after extraction is as follows:
Start ElasticSearch (use -d
for running in the background):
Access http://127.0.0.1:9200/. In normal conditions, the output page indicates a successful installation of ElasticSearch.
Configuring SkyWalking #
Open the main configuration file and modify the corresponding configurations:
Change the storage method to ElasticSearch, use the default configurations, and comment out the original H2 configurations.
Installing the Monitoring Dashboard #
Configure the webapp.yml
file under the webapp
directory, change the default timeout to 10000, and change server.port=13800
(the original port 8080 conflicts with the default sentinel-dashboard
port). If necessary, increase the timeout accordingly.
After the above configurations are completed, start SkyWalking and access the SkyWalking UI at http://127.0.0.1:8080. It should appear normal as follows:
The log shows a successful startup, and the default port for the management platform is 8080. The login information is admin/admin. When you open the browser, the following page will be displayed:
Installing the Client #
The agent, commonly known as a probe, collects and sends data to the collector in a non-intrusive manner. In this case, the probe will be loaded in Java using the java -jar
command. Before testing, you need to build the jar package. Execute the "maven install"
command in the root project in Eclipse to build the output jar package.
The directory where my SkyWalking agent is located is:
/Users/apple/software/apache-skywalking-apm-bin/agent/skywalking-agent.jar
During the jar building process, a small incident occurred, which showed an exception when starting:
appledeMacBook-Air:target apple$ java -jar parking-member.jar
parking-member.jar has no Main manifest attribute.
Although the pom
configuration refers to the spring-boot-maven-plugin
plugin, the generated jar still cannot find the main manifest to start the service. Since our parent tag is custom and cannot be built into an executable jar in the standard spring-boot-starter-parent
way, we need to handle it specially. Add the following configuration item in the pom
of each submodule to achieve the same effect as spring-boot-starter-parent
. Test it again to start it normally.
<executions>
<execution>
<goals>
<goal>repackage</goal>
</goals>
</execution>
</executions>
First, start an agent with the member as a test to see if the data can be written into the Elasticsearch database. Execute the following command in the terminal window:
java -javaagent:/Users/apple/software/apache-skywalking-apm-bin/agent/skywalking-agent.jar -Dskywalking.agent.service_name=parking-member-service -Dskywalking.collector.backend_service=127.0.0.1:11800 -jar parking-member-service.jar
Check the log file skywalking-api.log
in the agent directory, and you can see that the client is running normally:
DEBUG 2020-02-05 11:45:06:003 SkywalkingAgent-5-ServiceAndEndpointRegisterClient-0 ServiceAndEndpointRegisterClient : ServiceAndEndpointRegisterClient running, status:CONNECTED.
In the same way, start several other applications and perform various business logic functions, such as binding member’s phone number, opening monthly card, and paying for exit, to see the data collection and display of SkyWalking.
Note the jar execution package path at the end of the command. The following commands are based on the operation executed in the current directory.
java -javaagent:/Users/apple/software/apache-skywalking-apm-bin/agent/skywalking-agent.jar -Dskywalking.agent.service_name=parking-card-service -Dskywalking.collector.backend_service=127.0.0.1:11800 -jar parking-card-service.jar
java -javaagent:/Users/apple/software/apache-skywalking-apm-bin/agent/skywalking-agent.jar -Dskywalking.agent.service_name=parking-admin-server -Dskywalking.collector.backend_service=127.0.0.1:11800 -jar parking-admin-server.jar
java -javaagent:/Users/apple/software/apache-skywalking-apm-bin/agent/skywalking-agent.jar -Dskywalking.agent.service_name=parking-gateway-service -Dskywalking.collector.backend_service=127.0.0.1:11800 -jar parking-gateway.jar
java -javaagent:/Users/apple/software/apache-skywalking-apm-bin/agent/skywalking-agent.jar -Dskywalking.agent.service_name=parking-resource-service -Dskywalking.collector.backend_service=127.0.0.1:11800 -jar parking-resource.jar
java -javaagent:/Users/apple/software/apache-skywalking-apm-bin/agent/skywalking-agent.jar -Dskywalking.agent.service_name=parking-message-service -Dskywalking.collector.backend_service=127.0.0.1:11800 -jar parking-message-service.jar
Dashboard Data Display #
Refresh the UI of SkyWalking, and you can see that data has been collected. (Pay attention to the three key areas circled in color in the picture)
Performance monitoring of services, service instances, service interfaces, and databases, as well as service reference relationships and service interface traceability, can all be found in the dashboard. Here, we focus on the service request traceability.
Request Traceability Case Analysis #
Find a trace involving more than two services and analyze it in detail. Taking the member’s phone number binding as an example, the request involves two services: member service and monthly card service.
The left side shows the request endpoint: blue represents normal requests, and red represents exception requests.
The part circled in brown at the top of the picture is the trace ID, which is a global request identifier. From the beginning of the source request, all service interface calls involved in the middle carry this global identifier to mark this as a complete request chain. With this ID, all requests are linked to form a call tree, as shown by the blue arrows in the figure.
This reveals two pieces of information:
- A complete request chain can trace the corresponding log information (with the help of log tools, such as ELK).
- The node execution time in the chain provides solid evidence for later optimization.
In this article, I have guided you to use SkyWalking in microservices to ensure the normal operation of the microservice system. Interested students can install several other monitoring systems for comparison to discover their respective advantages and disadvantages.