29 Service Mesh How to Shield the Service Management Details of a Service Oriented System

29 Service Mesh - How to Shield the Service Management Details of a Service-Oriented System #

Hello, I am Tang Yang.

In the previous lessons on distributed services, I introduced you to the middleware solutions for communication and service governance between services in the process of microservice transformation, including:

Using an RPC framework to solve the problem of service communication;

Using a service registry to solve the problem of service registration and discovery;

Using distributed tracing middleware to identify slow cross-service requests;

Using load balancers to solve scalability issues;

Embedding service governance strategies such as service circuit breaking, degradation, and flow control in the API gateway.

After going through these steps, your vertical e-commerce system has basically completed the transformation into a microservice architecture. However, currently, your system still primarily uses Java as the programming language, and the mentioned service governance strategies and communication protocols between services are also implemented in Java.

Here arises a problem: Once there are several small teams in your organization that start using Go or PHP to develop new microservices, they will face challenges in the microservice transformation process.

Challenges of Cross-language Systems #

In fact, it is quite common for different teams within a company to use different programming languages. For example, the main development languages used by Weibo are Java and PHP, with some systems developed using Go in recent years. However, when different languages are used to develop microservices that need to invoke each other, there are two main challenges to consider.

First, when it comes to the communication protocol between services, it is important to choose a serialization method that is language-friendly in order to achieve cross-language communication. Let me give you an example.

Let’s say you develop an RPC service using Java and use Java’s native serialization method. This serialization method is not friendly to other languages, making it difficult to parse the serialized binary stream when calling this RPC service from another language. Therefore, I suggest that you consider whether the serialization protocol is language-friendly when choosing a serialization method. For example, you can choose Protobuf or Thrift, which makes it easy to solve the problem of cross-language service invocation.

Second, microservices developed in a new language cannot use the service governance strategies accumulated previously. For example, when an RPC client uses a service registry to subscribe to services, in order to avoid interacting with the registry for every RPC call, the client usually caches the data of the service nodes. If there are changes to the service nodes in the registry, the cached data in the RPC client will be notified and updated.

In addition, in order to reduce the load on the registry, we generally consider using multi-level caching (memory caching and file caching) on the RPC client to ensure the availability of the node cache. Initially, these strategies were implemented using the Java language and encapsulated in the registry client, which was provided for use by the RPC client. If a new language is used, all these logic would need to be implemented in the new language.

In addition to these, load balancing, circuit breaking, flow control, printing distributed tracing logs, and other service governance strategies would also need to be reimplemented. Reimplementing these strategies in another language undoubtedly brings a huge amount of work and is a major pain point in middleware development.

So, how can you abstract the details of service governance in a service-oriented architecture or how can you reuse service governance strategies between different languages?

One approach is to separate the details of service governance from the RPC client and create a separate proxy layer for deployment. This proxy layer can be implemented using a single language, and all traffic passes through this layer to leverage its service governance strategies. This is an implementation approach called “separation of concerns” and is also the core idea of Service Mesh.

How Service Mesh Works #

1. What is Service Mesh #

Service Mesh mainly deals with communication between services. Its main implementation is to deploy a proxy program on the same host as the application. In general, we refer to this proxy program as a “Sidecar”. The communication between services changes from direct connection between the client and server to the following form:

img

In this form, the RPC client sends the data packet to the Sidecar deployed on the same host for service discovery, load balancing, service routing, and traffic control. The data is then sent to the Sidecar of the specified service node, where access logging, distributed tracing logging, and rate limiting occur before the data is sent to the RPC server.

This approach can separate the business code from the service governance policies and make the service governance policies an independent basic module. In this way, not only can cross-language and service governance policy reuse be achieved, but also these Sidecars can be managed uniformly.

Currently, the most mentioned Service Mesh solution in the industry is Istio , which works as follows:

img

It divides components into the data plane and the control plane. The data plane is the Sidecar I mentioned earlier (Istio uses Envoy as the implementation of Sidecar). The control plane is mainly responsible for the execution of service governance policies. In Istio, it is mainly divided into Mixer, Pilot, and Istio-auth.

You don’t need to understand the role of each part, just know that they together constitute the service governance system.

However, in Istio, each request needs to go through the control plane, which means that each request needs to make a network call to Mixer. This greatly affects performance.

Therefore, most of the Service Mesh solutions open sourced by Chinese Internet giants generally only borrow the ideas of the data plane and the control plane from Istio, and then move the service governance policies to the Sidecar. The control plane is only responsible for policy distribution, so there is no need to go through the control plane for each request, which improves performance.

2. How to Forward Traffic to the Sidecar #

In the implementation of Service Mesh, a major problem is how to introduce the Sidecar as a network proxy in a transparent manner, so that data packets are redirected to the Sidecar’s port regardless of whether the data flows in or out. There are generally two approaches:

The first approach is to use iptables to achieve transparent traffic forwarding, which is also the method Isto defaults to using to forward data packets. To explain the principle of traffic forwarding more clearly, let’s briefly review what iptables is.

Iptables is a management tool for the Netfilter firewall software in the Linux kernel. It is located in the user space and can control Netfilter, achieving address translation functionality. By default, iptables has five chains. You can think of these five chains as the five steps in the process of data packet flow. They are PREROUTING, INPUT, FORWARD, OUTPUT, and POSTROUTING. The general flow of data packet transmission is as follows:

img

From the diagram, we can see that data packets enter the PREROUTING chain as the entry point. When the destination of the data packets is the local machine, they also flow through the OUTPUT chain. Therefore, we can add some rules to these two chains to redirect the data packets. Let me take Istio as an example to show you how to use iptables to achieve traffic forwarding.

In Istio, there is a script called “istio-iptables.sh”, which is executed when the Sidecar is initialized, mainly to set some iptables rules.

I have excerpted some key points to explain:

// Outbound traffic handling

iptables -t nat -N ISTIO_REDIRECT   // Add ISTIO_REDIRECT chain to handle outbound traffic

iptables -t nat -A ISTIO_REDIRECT -p tcp -j REDIRECT --to-port "${PROXY_PORT}" // Redirect traffic to the Sidecar's port

iptables -t nat -N ISTIO_OUTPUT // Add ISTIO_OUTPUT chain to handle outbound traffic

iptables -t nat -A OUTPUT -p tcp -j ISTIO_OUTPUT// Redirect traffic from the OUTPUT chain to the ISTIO_OUTPUT chain

for uid in ${PROXY_UID}; do
    iptables -t nat -A ISTIO_OUTPUT -m owner --uid-owner "${uid}" -j RETURN // Sidecar itself does not forward traffic

    done
    
    for gid in ${PROXY_GID}; do
    
        iptables -t nat -A ISTIO_OUTPUT -m owner --gid-owner "${gid}" -j RETURN  // Sidecar itself does not forward traffic
    
    done
    
    iptables -t nat -A ISTIO_OUTPUT -j ISTIO_REDIRECT // Forward traffic from ISTIO_OUTPUT chain to ISTIO_REDIRECT
    
    // Inbound traffic handling
    
    iptables -t nat -N ISTIO_IN_REDIRECT  // Add ISTIO_IN_REDIRECT chain to handle inbound traffic
    
    iptables -t nat -A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-port "${PROXY_PORT}" // Redirect incoming traffic to Sidecar port
    
    iptables -t ${table} -N ISTIO_INBOUND // Add ISTIO_INBOUND chain to handle inbound traffic
    
    iptables -t ${table} -A PREROUTING -p tcp -j ISTIO_INBOUND // Redirect PREROUTING traffic to ISTIO_INBOUND chain
    
    iptables -t nat -A ISTIO_INBOUND -p tcp --dport "${port}" -j ISTIO_IN_REDIRECT // Redirect traffic on specified destination port in ISTIO_INBOUND chain to ISTIO_IN_REDIRECT chain
    

Assuming the service node is deployed on port 9080 and the Sidecar port is 15001, the flow of inbound traffic is as follows:

![img](../images/014a530acbcac3f8b57635627a22e924.jpg)

The flow chart of outbound traffic is as follows:

![img](../images/43ee298a3f01c0de5d3ee0c5c96ea455.jpg)

The advantage of using Iptables is that it is completely transparent to the business. The business may not even know that there is a Sidecar, which reduces the time for business access. However, it also has its flaws. It will cause performance loss under high concurrency. Therefore, domestic Internet companies have adopted another method: lightweight clients.

In this method, the RPC client knows the deployment port of the Sidecar through configuration, and then sends the request for calling the service to the Sidecar through a lightweight client. Before forwarding the request, the Sidecar executes some service governance strategies, such as querying service node information and caching it from the service registry, and selecting a node from the service node using a load balancing strategy.

After the request is sent to the Sidecar on the server side, the access log and distributed tracing log are recorded on the server side, and then the request is forwarded to the actual service node. Of course, when the service node starts, it delegates the server-side Sidecar to register the node with the service registry, so the Sidecar knows the port where the actual service node is deployed. The entire request process is shown in the figure:

![img](../images/d344cb29d46dc480e67eabf57ddda622.jpg)

Of course, in addition to iptables and lightweight clients, there are also other solutions being explored, such as Cilium. This solution can realize request forwarding at the Socket layer, avoiding the performance loss of the iptables method. Among these solutions, I recommend using a lightweight client. Although there may be some redevelopment costs, it is the simplest to implement and can quickly implement Service Mesh in your project.

Regardless of the method adopted, you can deploy the Sidecar on the invocation chain of the client and server to proxy the traffic. This allows you to use the service governance strategies running in the Sidecar. As for these strategies, I have already explained them in the previous courses (you can review courses 23 to 26), so I won't go into detail here.

At the same time, I also recommend that you learn about some open source Service Mesh frameworks in the industry, so that you can have more choices when choosing a solution. Currently, there are several mature open source Service Mesh frameworks in the open source field. You can read their documentation for further understanding, as extended reading for this section.

Istio is the most famous framework in the industry. It introduces the concepts of the data plane and control plane and is a pioneer of Service Mesh. The drawback is the performance issue of Mixer mentioned earlier.

Linkerd is the first generation of Service Mesh, written in Scala. Its downside is the memory consumption.

SOFAMesh is a Service Mesh component open sourced by Ant Financial, which has extensive experience in large-scale implementation at Ant Financial.
## Course Summary

In this lesson, in order to solve the problem of reusing service governance strategies in cross-language scenarios, I introduced you to what Service Mesh is and how to implement it in practical projects. The key points you need to focus on are as follows:

1. Service Mesh consists of a data plane and a control plane. The data plane is mainly responsible for data transmission, while the control plane is used to control the implementation of service governance strategies. For performance reasons, service governance strategies are generally implemented in the data plane, and the control plane is responsible for distributing the service governance strategy data.

2. Currently, there are two main ways to implement Sidecar. One is using iptables to intercept traffic, and the other is using a lightweight client to forward traffic.

Currently, in some large companies such as Weibo and Ant Financial, Service Mesh has been widely implemented in practical projects. I recommend that you continue to pay attention to this technology. It is a technology that separates business logic from communication infrastructure. If your business encounters challenges in service governance in a multi-language environment, if you need to quickly implement service governance strategies for your legacy services, and if you want to quickly share your experience in service governance with other teams, then Service Mesh is a good choice for you.