15 Service Layer Traffic Steering Technology Implementation

15 Service Layer Traffic Steering Technology Implementation #

The Service layer introduced by Kubernetes brings two features to the cluster: ClusterIP, which assigns a stable service name to the service through cluster DNS, such as foo.bar.svc.cluster.local; and reverse proxy, which uses various network data transformation technologies such as iptables/IPVS/eBPF to load traffic to upstream Pod containers. Now that we have introduced the basic technologies of the Service layer, let’s analyze some of the latest developments from a practical perspective. From these, we can summarize the strategies for optimizing the technical development process.

Misunderstanding of Ingress? #

In the community documentation, we know that the Ingress resource is created to introduce HTTP(S) web traffic to the cluster. In general, it is said that Ingress does not support L4 layer drainage. If you want to support other network protocols, it is best to use the Service’s other two forms: ServiceType=NodePort or ServiceType=LoadBalancer modes.

First, whether the Ingress resource object can support the L4 layer is not entirely controlled by this resource object. The actual drainage capability is determined by the independently deployed Ingress-Nginx instance, which means Nginx decides. We know that Nginx itself supports the L4 layer. Therefore, Ingress can provide support by adding parameters:

apiVersion: v1
kind: ConfigMap
metadata:
  name: tcp-services
  namespace: default
data:
  27017: "default/tcp-svc:27017"
---
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
  name: nginx-ingress
  namespace: default
spec:
  releaseName: nginx-ingress
  chart:
    repository: https://kubernetes-charts.storage.googleapis.com 
    name: nginx-ingress
    version: 1.6.10
  values:
    tcp:
      27017: "default/tcp-svc:27017"

Similarly, UDP protocol can also be supported, but I won’t go into detail here. From a practical perspective, different Ingress controllers may have different support for this dependency on Nginx. You need to refer to the development documentation for more details.

In addition, in the introduction of Ingress-Nginx, it is usually stated that the traffic is load balanced to the Pod container group directly through the Service, indicating that many architecture diagrams are drawn in this way: img

In terms of engineering code implementation, by default, Ingress-Nginx does not pass through the Service or kube-proxy to access the Pod. Instead, it finds the corresponding Endpoint through the Service and directly distributes the request to the corresponding Pod IP and port. This approach avoids the DNAT conversion at the Service layer, but the disadvantage is the lack of reverse proxy capability and the absence of high-availability SLA for the access traffic service. In order to re-enable the reverse proxy capability at the Service layer, you can add the annotation parameter:

nginx.ingress.kubernetes.io/service-upstream: "true"

Through this annotation feature, Nginx can use the upstream Service ClusterIP as the upstream entrance. There are many enhancements to this feature. If you are interested, you can refer to the annotation feature list for more details. In addition, if you are a Golang programmer, you can also extend new annotation parameters and continuously add new features to Nginx. Please refer to the case study for more information. I won’t go into details here.

Misunderstanding of Service? #

In the community documentation, it is explicitly stated that Pods deployed to the cluster cannot directly provide external services. Therefore, the Service resource object was introduced to expose services. However, as more practice examples emerge, Pods can also bind directly to host ports. Please see the following example:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet
  containers:
    - name: nginx
      image: nginx
      ports:
        - containerPort: 8080

For Pods running in hostNetwork mode, you need to explicitly set the DNS policy to ClusterFirstWithHostNet in order for the Pods to use the cluster DNS query service. This example also reminds us that Service IP is not the only way to drain traffic, and we must apply Kubernetes features based on actual scenarios. Next, let’s analyze the Service IP. It is a virtual IP address created by iptables. According to iptables rules, these types of IP addresses are collectively referred to as DNAT mode:

service-dnat-ip

Through the iptables rules generated by kube-proxy (the latest version uses the IPVS module to generate proxy rules, but the principle is similar), whenever a packet’s destination is the service IP, it will be DNATed (DNAT = Destination Network Address Translation), which means that the destination IP is changed from the service IP to a randomly selected endpoint Pod IP by iptables. This ensures that the load traffic is evenly distributed among the backend Pods. When DNAT occurs, this information is stored in the conntrack, which is the Linux connection tracking table (which stores a 5-tuple data record: protocol, srcIP, srcPort, dstIP, dstPort). When the data is replied back, the DNAT can be cancelled, which means that the source IP is changed from the Pod IP to the Service IP. This way, the client does not need to know how the subsequent flow of data packets is processed.

13-3-dnat-5-tuple

For outbound traffic initiated from Pods, NAT conversion is also required. Generally, a node can have both private virtual IP and public IP. In normal communication between the node and external IPs, for outbound packets, the source IP changes from the node’s private virtual IP to its public IP, and for inbound packet replies, it is the other way around. However, when a connection to an external IP is initiated by a Pod, the source IP is the Pod’s IP. Kube-proxy adds additional iptables rules to perform SNAT (Source Network Address Translation), which is IP MASQUERADE. The SNAT rule tells the kernel that outbound packets need to use the node’s external IP instead of the source Pod’s IP. The system also needs to keep a conntrack entry to undo the SNAT for replies.

Note the performance issue here. With the increase in the scale of the cluster containers, conntrack can increase significantly. The IPVS solution, introduced later by Huawei Container Team, discovered this bottleneck during load testing of large-scale container loads and proposed the introduction of IPVS to solve this problem. In my practical applications, I found that the iptables solution performs similarly to IPVS in small-scale cluster scenarios. Therefore, the IPVS solution is essentially a temporary solution. It only solves the DNAT conversion of inbound traffic packets, while SNAT conversion still needs to be maintained by iptables. I believe that the process of replacing iptables with eBPF technology will take place in the near future, eventually achieving the optimal traffic load design.

Advanced policies for Ingress #

Since the introduction of Ingress, its usage is constantly expanding. Here, I summarize some possible use cases, hoping to apply them quickly when needed.

Ingress rule aggregation case

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: <ingress-name>
spec:
  rules:
  - host: <yourchoice1>.<cluster-id>.k8s.gigantic.io
    http:
      paths:
      - path: /
        backend:
          serviceName: <service1-name>
          servicePort: <service1-port>
  - host: <yourchoice2>.<cluster-id>.k8s.gigantic.io
    http:
      paths:
      - path: /
        backend:
          serviceName: <service2-name>
          servicePort: <service2-port>

Address routing case

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: <ingress-name>
spec:
  rules:
  - host: <yourchoice>.<cluster-id>.k8s.gigantic.io
    http:
      paths:
      - path: /foo
        backend:
          serviceName: <service1-name>
          servicePort: <service1-port>
      - path: /bar
        backend:
          serviceName: <service2-name>
          servicePort: <service2-port>

SSL passthrough case

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: <ingress-name>
spec:
  rules:
  - host: <yourchoice>.<cluster-id>.k8s.gigantic.io
    http:
      paths:
        - backend:
            serviceName: <service-name>
            servicePort: <service-port>
      tls:
      - secretName: <tls-secret-name>
        hosts:
        - <yourchoice>.<cluster-id>.k8s.gigantic.io
annotations:
  nginx.ingress.kubernetes.io/ssl-passthrough: "true"
spec:
  tls:
  - hosts:
    - <yourchoice>.<cluster-id>.k8s.gigantic.io
  rules:
  - host: <yourchoice>.<cluster-id>.k8s.gigantic.io
    http:
      paths:
      - path: /
        backend:
          serviceName: <service-name>
          servicePort: <service-port>

SSL Termination Example

apiVersion: v1
kind: Secret
type: kubernetes.io/tls
metadata:
  name: mytlssecret
data:
  tls.crt: <base64 encoded cert>
  tls.key: <base64 encoded key>
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: <ingress-name>
spec:
  tls:
  - hosts:
    - <yourchoice>.<cluster-id>.k8s.gigantic.io
    secretName: mytlssecret
  rules:
  - host: <yourchoice>.<cluster-id>.k8s.gigantic.io
    http:
      paths:
      - path: /
        backend:
          serviceName: <service-name>
          servicePort: <service-port>

CORS Cross-Origin Request Example

To enable Cross-Origin Resource Sharing (CORS) in Ingress rules, add the annotation:

ingress.kubernetes.io/enable-cors: "true"

Rewrite Routing Path Example

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: <ingress-name>
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: <yourchoice>.<cluster-id>.k8s.gigantic.io
    http:
      paths:
      - path: /foo
        backend:
          serviceName: <service-name>
          servicePort: <service1port>

Request Traffic Limitation Example

nginx.ingress.kubernetes.io/limit-connections: maximum concurrent connections per IP address

nginx.ingress.kubernetes.io/limit-rps: maximum connections per second from a given IP

Both of these annotations can be specified in one Ingress rule, with limit-rps taking precedence.

Backend SSL Support Example

By default, Nginx uses HTTP to reach services. To add SSL/TLS support to the backend in an Ingress rule, add the annotation:

nginx.ingress.kubernetes.io/secure-backends: "true"

Change the protocol to HTTPS in the Ingress rule.

Whitelist Example

You can specify the allowed client IP source range by using the annotation:

nginx.ingress.kubernetes.io/whitelist-source-range

The value is a comma-separated list of CIDRs, for example 10.0.0.0/24,172.10.0.1.

Session Affinity and Cookie Affinity

Annotation:

nginx.ingress.kubernetes.io/affinity

Enables and sets the affinity type for all upstreams in the Ingress. Requests will always be directed to the same upstream service node.

If you use the cookie type, you can also specify the cookie name used for routing requests by using the annotation:

nginx.ingress.kubernetes.io/session-cookie-name

The default value is creating a cookie named “route”.

annotation nginx.ingress.kubernetes.io/session-cookie-hash

Defines the algorithm to use for hashing the used upstream. The default value is MD5, and possible values are MD5, SHA1, and Index. The Index option does not perform hashing and uses an index in memory, which is faster and has lower overhead.

Note that the matching rules between this Index and the list of upstream service nodes are inconsistent. Therefore, when services are reloaded and the Pod of the upstream service node has changed, the index value cannot guarantee consistency with the previous Pod of the upstream service node. Use this index algorithm with caution and only when necessary.

Summary #

The main capability of Service layer access flow technology is to accurately route traffic to the Service Pods. When we need elasticity and high availability, we can only ensure reliability by adding a layer of redundant design of a service reverse proxy. There are three choices for this kind of redundant design: Ingress, NodePort, and LoadBalancer. Currently, none of the three choices can adapt to all business scenarios. Considering the specification of traffic routing, Ingress is the preferred choice due to its ability to magnify capabilities through annotations in its ingress-controller. I predict that in future development trends, Service capability will be taken over by Ingress, and we will no longer have to worry about NodePort.