Kubernetes Networking Basic
Regarding networking, Kubernetes impose only three simple but powerful rules:
1. All nodes in the cluster must be able to talk to each other.
2. All pods on the network can communicate with each other without NAT.
3. Every pod gets its own IP address.
This means that we have a big flat network for all pods.
Therefore, we have two networks, one for nodes and one pods. Now, Kubernetes itself doesn't implement the node network, it presents a network plugin interface called the CNI (Container Network Interface), and then third parties actually provide the plugins that implement the pod network. Some of those providers are Flannel, Calico and Weave.
The pod network is big and flat. It is an overlay network. Therefore, it's stretched across all nodes and every pod has an IP on it.
Plugins can provide different behaviors, but the main idea is that it is a big and flat network and each node in the cluster gets allocated a subset of the address space which will be used by the pods living on that node.
Then, as pods are spin up, they get scheduled to a particular node and the IP address that they get is one from the range of addresses allocated to that node. This IP that the pod gets, It's not only visible and reachable from all nodes and all pods, it is also the address that the pod sees, as itself having. So, there is no network weirdness here where you've got one address on the inside of the pod, and another one on the outside on the network. The pod sees itself as this address and so does everything else on the network.
At the highest level, that is the very basics of Kubernetes networking.
Service Fundamentals
We have this big flat network where every Pod can see every other Pod, but Pods have an important characteristic, they're ephemeral. They come and go. So, at a given moment we have a bunch of Pods that are serving for, example, an internal search API for an application.
Let’s assume we've got two pods, and each one's has an IP, and then of course, we have other components of the application (other pods) in the cluster that are using it.
However, if demand increase and you scale the Service, each time you do it, you are adding new Pods with new IPs.
Then, when you scale back down, you're taking them away and the same happen when nodes fail. We are losing Pods with its IPs so, how the other pods that want to use the search API do to keep up with which Pods exist and are healthy?
Here is where the concept of “Service” enters the scene. Service is a stable abstraction for the set of Pods. So, what we do here, is we create a Service object, and we logically put it in front of the Pods. Then, at a high level, we say, "Instead of trying to hit individual Pods that can come and go, hit the Service."
And obviously, the Service is clever enough to handle all the dynamic that is going on with the Pods behind it.
But how does it actually work? Well, every Service gets a name and an IP. And there's a couple things to note about these two attributes the server gets (name and IP).
First, they are stable. Kubernetes guarantees that once you create a Service, its name and IP will never change for its entire life. That solves the reliability issue of Pods.
Second. The name and IP get registered with the cluster's built-in DNS, or add-on DNS. Every Kubernetes cluster has a native DNS Service, and every Pod in a cluster knows how to use it.
So, every single Pod in our cluster can resolve search API and reach the Service. Well, this resolve the front-end side of the Service. As you can see the Services has a kind of load balancer or proxy that has a front-end configuration and a backend one too. In the backend, it needs to know which Pods to send the traffic to. This is done with a label selector.
The example here is saying, "OK, "balance traffic across all Pods in the cluster "with this label app equal to search". Notice that we can list multiple labels.
But this is the basic configuration. This is the name in the IP on the front, and maybe a port as well. So you hit the Service and when you do it, it will balance the traffic across all of the Pods in the cluster with the label app equal to "search".
But how does the Service know which Pods are alive at a given moment?
When you create a Service object with a label selector, Kubernetes also creates another object on the cluster called an "endpoint object," which is a list of Pod IPs and ports that match the Service's label selector.
So, in the example above, the endpoint object for the Service will have three entries, one for each of the three Pods. But the cool thing is that the Service object is always watching the API server to see if new Pods that match its selector are being added or removed, and as they are, it automatically updates the list in the endpoint object.
So, the endpoint object itself always has the same name as the Service object it's associated with, and it is a list of Pods that the Service can send requests to.
Then, a client makes a request into this frontend config of the Service, and for the Service to know where to forward the request, it just picks one IP from the endpoint list because it's always up-to-date.
Service Types
Now, services are a powerful concept and if we look at a quick example from this sample application yaml file.
This yaml document is of the kind “Service” and it has a section called "spec:” and in the spec there is an attribute called “type”.
This is because there are different types of services that behave slightly differently. The main ones are, ClusterIP, NodePort, and LoadBalancer. All these types have essentially the same purpose, which is provide a stable point of abstraction for a set of pods that compound an application service.
Each of these types, address different type of use cases. ClusterIP is the default, and it is the most basic. And, as the name suggests, it gives a service its own IP address that is only reachable from within the cluster.
The Service type NodePort also gets a Cluster IP so you can still access them in that way, but it enables access from outside of the cluster by the mean of a cluster-wide port.
So, each service with type NodePort, gets its own cluster-wide port and it is accessible from outside of the cluster by taking the IP address of any of the nodes, and then appending that cluster-wide port or NodePort value to the end.
So, the picture above is showing a four-node cluster, each node got an IP on the node network. We create a NodePort service, that generates a random port from a default range, that when you append it to the IP of any nodes in the cluster, you reach the service and then obviously the pods that are serving the request.
The default range of ports that a NodePort value can come from, is between 30,000 and 32,767. Something that you can do though, is specify particular ports from NodePort Service in the case you need to use one particular port for any reason, you just do that in your Service manifest file with the NodePort field as shown below:
Any port that you specify here, should come from the range of configured ports. And again. by default, the ports range is 30,000 to 32,767, and it can be TCP and UDP protocol, the default is TCP.
In summary, this is NodePort type, access a service via a combination of a Node IP and the cluster-wide port called a NodePort. This works from inside and outside the cluster.
The third type is LoadBalancer. And this is for the case when you are deploying in the cloud. Because it provisions a cloud based Load Balancer and then integrates that with a Service on your cluster.
Obviously to provision a Load Balancer on a cloud platform, your cloud platform must support LoadBalancer creation and configuration via a public API. Also, Kubernetes must know it and support it as well.
You are creating a facing LoadBalancer that proxies traffic to your Service, this is implemented usually using a NodePort Service. But here you don't have to create the NodePort Service, Kubernetes does it.
So, for clarification, some cloud platforms configure the backend of their LoadBalancer to forward traffic to nodes in the cluster by way of the NodePort Service. That means they create a NodePort service and then they balance traffic coming in, across all nodes in the cluster, on the NodePort port.
But, you will occasionally see implementation differences for LoadBalancer Services, like Google Cloud that doesn’t need to create a NodePort service but, AWS and Azure do. In a on premises development environment, you can use Nginx for this. In any case these are implementation details that have no impact on the overall functionality of the LoadBalancer Service.
The important point is that at this point you got a cloud native, public facing Load Balancer that is configured to connect your Kubernetes Service on the back end.
Service Network
When you create your Service, it gets a unique, long lived IP address. But surely you noticed that this IP is actually not on any network that you can recognize. It is not on the Node Network, and it is not on the Pod Network. It is actually on a different third network that we call the “Service Network”.
In fact, it is not really an actual network. No matter how much digging you do with your normal networking tools, you are not going to find any interfaces on it.
There is not even a route to it either. So, how the does traffic get to it then?
Every node on the network has a process running called kube-proxy. Actually, it is running on the cluster as a DaemonSet, this means that you get one kube-proxy pod on every node.
One of its main jobs is to write a bunch of IPVS or IPTables rules on each node, that basically say, any requests that you see addressed to this service network, rewrite the headers and send them to the appropriate pods on the pod network.
A flow goes like this: in this picture we have some nodes with its IPs, and we also have a pods network. We have a Service, and the IP is neither the pods network nor in the nodes network.
Now, let’s say we have an application running on a pod and need to access another application running in other pod that happen to be in a different node, the pod sends traffic to the Service. DNS is queried and it returns the Service IP.
So, the pod then sends packets to its Virtual Ethernet Interface which has no idea about the service network so, as we would expect, normal networking behavior applies, and the interface sends the packets to its default gateway.
This happens to be a Linux bridge called CBR0. Now as we know from Docker, this is the equivalent of Docker0. But Kubernetes names it CBR0 for Custom Bridge Zero. It is just a dumb bridge, so it sends the packets up stream again, this time, to the nodes, ETH0 interface.
Now when it gets there, the packets is processed by the kernel on the host. And when this happens, the kernel says, “hang on a second, I've got some IPVS or IPTables Rules for stuff with this address”. And it applies a rule, which pretty much says something along the lines of “anytime you see a packet for this address, rewrite the destination address to this on the pod network, and send it on”.
And that is it. Now, we don't want to get into too much details, but there's a few ways to actually implement all of this. And since Kubernetes 1.2, running kube-proxy in IPTables mode has been the default.
And it's fine at least, until you get to proper scale. As you may know at its core, IPTables is a packet filtering technology for fire-walling. And in Kubernetes, we're forcing it to behave as a load balancer. The problem is this doesn’t scale well.
So, there's a new mode called IPVS mode. And as the name suggests, it uses the native IPVS in the Linux kernel. Which in fact, has always been intended to be more of a load balancer, So it scales better than IPTables at this. Plus, it supports load-balancing algorithms.
So, in IPVS mode, it uses round robin's algorithm as the default, but under the hood, it can also do least connection, source hashing, destination hashing, shortest expected delay, and more. Kube-proxy in IPVS mode has been stable since Kubernetes 1.11, and it is now the default.
REFERENCES
Nigel Poulton. The Kubernetes Book. Update April 2021.
Marko Luksa. Kubernetes in Action. 2018 Edition
Brendan Burns , Joe Beda, et al. Kubernetes: Up and Running.
Kubernetes.io. Kubernetes documentation.