13 Understanding the Objects Exposing External Services Ingress and Service

13 Understanding the Objects Exposing External Services- Ingress and Service #

In Kubernetes, a service can be understood as the smallest unit object that exposes a service externally. This is different from the Pod object. For example, when a user deploys an application through the Deployment object, the ReplicaSet object helps us maintain the number of Pod instances. The container network used by Pods defaults to an overlay network built on the host network, and by default, these Pod instances’ services cannot be accessed directly from the external network. In order to effectively connect with the container network, Kubernetes creates another layer of virtual network called ClusterIP, which is represented by the Service object. From an implementation perspective, it uses iptables to invoke the underlying netfilter to implement virtual IP and then accurately and correctly route the north-south traffic to the back-end Pod instances through the corresponding rule chain. As the demand arises, the Ingress object, which was later expanded, allows third-party proxy services such as HAProxy and Nginx, which are 7-layer load balancing tools, to establish a connection between external traffic and internal Service objects. The purpose of the Ingress object is to address the demand for high-performance application gateway access in the container cluster.

Thoughts on Service #

The networking defined by Service is based on iptables arranging netfilter rules to support virtual IPs. The Service object is designed in reverse proxy mode, supporting load balancing of north-south traffic, and using DNAT to route traffic to specific business Pods in the back-end. In order to intercept incoming traffic and perform NAT conversion, Kubernetes creates two custom chain rules, PREROUTING and OUTPUT. Like this:

-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
...
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
...

PREROUTING mainly handles incoming traffic from external sources and incoming traffic from Pod container networks, while OUTPUT mainly handles outgoing traffic to external networks and outgoing traffic to Pod container networks.

Because the published service definitely needs to expose its services externally, Kubernetes creates a custom rule chain, KUBE-SERVICE, to support cluster-level service discovery, i.e., ClusterIP and LoadBalancer types. Finally, the services are exposed externally through another custom rule chain, KUBE-NODEPORTS. A sample case is as follows:

-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS

Each Service creates a set of rule chains, and the NODEPORTS rule must be the last line. Therefore, it is not difficult to see that when the number of services reaches tens of thousands, iptables cannot handle the processing of such large-scale rule chains. This is why the latest service solution introduces ipvs to replace iptables.

ClusterIP Type #

The default type of Service, which can be classified into the following 5 categories depending on the scenario:

  • ClusterIP service
  • ClusterIP service with session affinity
  • ClusterIP with external IPs
  • ClusterIP service without any endpoints
  • Headless service

To deepen the impression, let’s learn about the Service object through an example:

#redis.yaml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: redis
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis
        ports:
        - containerPort: 6379
          name: redis

First, create a regular Service:

#redis-clusterip.yaml
apiVersion: v1
kind: Service
metadata:
  name: redis
spec:
  ports:
  - port: 6379
  selector:
    app: redis

Check the Service configuration:

#kubectl get service redis
NAME    TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
redis   ClusterIP   10.0.19.85   <none>        6379/TCP   3d4h
#redis-none.yaml
apiVersion: v1
kind: Service
metadata:
  name: redis-none
spec:
  clusterIP: None
  ports:
  - port: 6379
    targetPort: 6379
  selector:
    app: redis

创建 None 类型的 Service 后,规则链如下:

-A KUBE-SERVICES -d 10.0.219.235/32 -p tcp -m comment --comment "default/redis-none: cluster IP" -m tcp --dport 6379 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -m comment --comment "default/redis-none: cluster IP" -m tcp --dport 6379 -j KUBE-SVC-QY4CQRKLFE76RKXU

-A KUBE-SVC-QY4CQRKLFE76RKXU -j KUBE-SEP-4QXTG2LAS77LGD2D
-A KUBE-SVC-QY4CQRKLFE76RKXU -j KUBE-SEP-YULDRJJKYNJJGS5L

-A KUBE-SEP-4QXTG2LAS77LGD2D -s 172.17.0.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-4QXTG2LAS77LGD2D -p tcp -m tcp -j DNAT --to-destination 172.17.0.3:6379

-A KUBE-SEP-YULDRJJKYNJJGS5L -s 172.17.0.4/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-YULDRJJKYNJJGS5L -p tcp -m tcp -j DNAT --to-destination 172.17.0.4:6379

可以看到,None 类型的 Service 是直接通过 DNAT 规则将流量转发到对应的 Pod,不需要经过负载均衡。 #redis-clusterip-headless.yaml apiVersion: v1 kind: Service metadata: name: redis-headless spec: clusterIP: None ports: - port: 6379 selector: app: redis

Through internal DNS, Service queries will directly list the IP addresses of the Pods, as shown below:

#nslookup redis-headless.default.svc.cluster.local 10.0.0.10
Server:        10.0.0.10
Address:    10.0.0.10#53

Name:    redis-headless.default.svc.cluster.local
Address: 10.244.1.69
Name:    redis-headless.default.svc.cluster.local
Address: 10.244.1.70

NodePort Type #

NodePort type is also one of the most commonly used types, categorized into the following 5 types based on scenarios:

  • NodePort service
  • NodePort service with externalTrafficPolicy: Local
  • NodePort service without any endpoints
  • NodePort service with session affinity
  • NodePort service with externalTrafficPolicy: Local and session affinity

The general definition is as follows:

#redis-nodeport.yaml
apiVersion: v1
kind: Service
metadata:
  name: redis-nodeport
spec:
  type: NodePort
  ports:
  - nodePort: 30001
    port: 6379
    targetPort: 6379    
  selector:
    app: redis

The creation results can be viewed as follows:

#kubectl get service redis-nodeport
NAME             TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
redis-nodeport   NodePort   10.0.118.143   <none>        6379:30001/TCP   107s

#kubectl get endpoints redis-nodeport
NAME             ENDPOINTS          AGE
redis-nodeport   10.244.0.4:6379   110s

By exposing port 30001 at the host level, the services in the container cluster can be easily accessed from the external network.

Thoughts on Ingress #

Ingress connects HTTP and HTTPS routes from outside the cluster to services within the cluster. Traffic routing is controlled by the rules defined on the Ingress resource. In reality, the actual traffic load is supported by third-party proxy servers, such as HAProxy. Let’s look back: before the advent of Ingress, we usually deployed an access gateway outside the cluster and routed traffic into the cluster. However, services in a Kubernetes cluster are dynamic. It would be perfect if we could dynamically obtain the list of services and ports from the API Server, and then update them in real time on the gateway. Yes, this is where Ingress comes in. Its main capabilities are providing externally-accessible URLs for services, load balancing traffic, and terminating SSL/TLS.

Let’s familiarize ourselves with an example of a minimal Ingress resource:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: test-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - http:
      paths:
      - path: /testpath
        pathType: Prefix
        backend:
          serviceName: test
          servicePort: 80

Nginx rules are updated primarily by nginx-controller periodically fetching from the API Server.

Feature 1: Service Grouping #

One grouping configuration routes traffic from a single IP address to multiple services based on the requested HTTP URI. Ingress allows minimizing the number of load balancers. For example:

foo.bar.com -> 178.91.123.132 -> / foo    service1:4200
                                 / bar    service2:8080

Feature 2: Name-based Virtual Hosting #

Name-based virtual domain supports routing HTTP traffic for multiple hostnames to the same IP address.

foo.bar.com --|                 |-> foo.bar.com service1:80
              | 178.91.123.132  |
bar.foo.com --|                 |-> bar.foo.com service2:80

Feature 3: TLS Termination #

Ingress can be protected by setting a Secret containing TLS private keys and certificates. Currently, Ingress only supports a single TLS port 443 and assumes TLS termination.

apiVersion: v1
kind: Secret
metadata:
  name: testsecret-tls
  namespace: default
data:
  tls.crt: base64 encoded cert
  tls.key: base64 encoded key
type: kubernetes.io/tls

Referencing this Secret in the Ingress tells the Ingress controller to use TLS encryption for the channel between the client and the load balancer. Make sure the created TLS Secret comes from a certificate with the Common Name (CN) matching sslexample.foo.com. The Common Name is also known as the Fully Qualified Domain Name (FQDN).

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: tls-example-ingress
spec:
  tls:
  - hosts:
    - sslexample.foo.com
    secretName: testsecret-tls
  rules:
    - host: sslexample.foo.com
      http:
        paths:
        - path: /
          backend:
            serviceName: service1
            servicePort: 80

From the above example, we can see that although Ingress takes on the role of an application gateway, its design capabilities are limited by third-party proxy components and are not as flexible as custom application gateways. Therefore, in specific business scenarios, we still need to consider the requirements before deciding whether to introduce Ingress.

Conclusion #

Services and Ingress are often misunderstood and mixed with Pod service discovery when it comes to external services in a cluster. Through the analysis of the above cases, we can fully understand the implementation of Services. From practical experience, we find that Services play the role of an entry point, with the sole functionality of exposing the port on the host NodePort. The NAT translation implemented by iptables only affects performance when there are tens of thousands of services and the rule chain increases substantially. Using the IPVS reverse proxy module can alleviate this issue. However, it is still not possible to completely eliminate the iptables module, which handles NAT connectivity between Services and Pod containers in the network. With the rise of eBPF, it is expected that the removal of iptables will soon be possible.