Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain Kubernetes network model in networking concept index #41419

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 140 additions & 58 deletions content/en/docs/concepts/services-networking/_index.md
Original file line number Diff line number Diff line change
@@ -1,79 +1,97 @@
---
title: "Services, Load Balancing, and Networking"
weight: 60
simple_list: true
description: >
Concepts and resources behind networking in Kubernetes.
---

## The Kubernetes network model
The [Kubernetes network model](#kubernetes-network-model) enables container networking within a pod and between pods
on the same or different {{< glossary_tooltip text="nodes" term_id="node" >}}.

The Kubernetes network model is built out of several pieces:
Kubernetes networking addresses four concerns:
- Containers within a Pod [use networking to communicate](/docs/concepts/services-networking/dns-pod-service/) via loopback.
- Cluster networking provides communication between different Pods.
- The [Service](/docs/concepts/services-networking/service/) API lets you
[expose an application running in Pods](/docs/tutorials/services/connect-applications-service/)
to be reachable from outside your cluster.
- [Gateway API](/docs/concepts/services-networking/gateway/) is an {{<glossary_tooltip text="add-on" term_id="addons">}}
that provides an expressive, extensible, and role-oriented family of API kinds for modeling service networking.
- [Ingress](/docs/concepts/services-networking/ingress/) provides extra functionality
specifically for exposing HTTP applications, websites and APIs.

* Each [pod](/docs/concepts/workloads/pods/) in a cluster gets its
own unique cluster-wide IP address.
[Gateway](https://gateway-api.sigs.k8s.io/) and
[Ingress](/docs/concepts/services-networking/ingress/) provide
extra functionality specifically for exposing your applications, websites and APIs, usually to clients outside
the cluster. Ingress and Gateway often use a load balancer to make that work reliably and at scale.
- You can also use Services to
[publish services only for consumption inside your cluster](/docs/concepts/services-networking/service-traffic-policy/).

* A pod has its own private network namespace which is shared by
all of the containers within the pod. Processes running in
different containers in the same pod can communicate with each
other over `localhost`.
The [Connecting Applications with Services](/docs/tutorials/services/connect-applications-service/) tutorial lets you learn
about Services and Kubernetes networking with a hands-on example.

* The _pod network_ (also called a cluster network) handles communication
between pods. It ensures that (barring intentional network segmentation):
Read on to learn more about the [Kubernetes network model](#kubernetes-network-model).

* All pods can communicate with all other pods, whether they are
on the same [node](/docs/concepts/architecture/nodes/) or on
different nodes. Pods can communicate with each other
directly, without the use of proxies or address translation (NAT).
## Kubernetes network model

On Windows, this rule does not apply to host-network pods.
Figure 1 depicts a cluster with a control plane, a small number of nodes (VM or physical) attached to a network, each
with pods containing one more containers. In addition, each pod has its own IP address called a _pod IP_.

* Agents on a node (such as system daemons, or kubelet) can
communicate with all pods on that node.
{{< figure src="/docs/images/k8s-net-model-arch.svg" alt="Diagram of Kubernetes networking" class="diagram-large" caption="Figure 1. High-level example of a Kubernetes cluster, illustrating container networking." >}}

* The [Service](/docs/concepts/services-networking/service/) API
lets you provide a stable (long lived) IP address or hostname for a service implemented
by one or more backend pods, where the individual pods making up
the service can change over time.
The other K8s network components shown in figure consist of the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we abbreviate Kubernetes to K8s in the docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we do, and one reason for that is to defend K8s as a trademark. We mostly write Kubernetes out longhand though.


* Kubernetes automatically manages
[EndpointSlice](/docs/concepts/services-networking/endpoint-slices/)
objects to provide information about the pods currently backing a Service.
* _Local pod networking_ - optional component that enables pod-to-pod communications in the same node. You might recognize
this as a virtual layer 2 bridge (which is just one possible implementation).
Comment on lines +44 to +45
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bridges are not part of the Kubernetes network model. It's true that a majority of Kubernetes plugins use some sort of bridge interface on each node (though not always an L2 bridge), but this is completely invisible at the level of "the Kubernetes network model". Unless you are debugging your cluster or developing a network plugin, then the Kubernetes network model is just that all pods can communicate (at L4) with all other pods, and that's it. You neither need to know, nor to care, exactly how the network plugin implements that.

(And if you are debugging your cluster or developing a network plugin then the diagram here is still not useful because in that case you need to know specifically how your own network plugin works, not how some theoretical network plugin works.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Pods can communicate with other Pods at layer 3. Pods can observe packet-layer communications if they try hard enough. On Linux, you might need to add a capability for that to work.

Anyway, if I need to omit the mention of a bridge for this to merge, I can.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Pods can communicate with other Pods at layer 3

Some plugins might allow arbitrary layer 3 communication, but Kubernetes only guarantees that you can communicate with pods at L4 via TCP and UDP (and SCTP if the plugin supports it). There are no conformance requirements that pods be able to send or receive SCTP, IP multicast, IP broadcast, IPsec, ICMP pings, or any other arbitrary L3 traffic. (And there are good reasons for plugins to not allow arbitrary traffic between pods.)

Pods can observe packet-layer communications if they try hard enough.

Pods can observe the packets coming in and out of their own eth0, given CAP_NET_ADMIN. They have no ability to observe what happens on the other side of their eth0, regardless of capabilities. (Well, if they're privileged then they can install some eBPF and do whateverTF they want, but, you know.)

Anyway, if I need to omit the mention of a bridge for this to merge, I can.

as above, if you're explaining the diagram then go ahead and mention the bridge, but it should be clear that this is just how pod networking works in this example, not how it works always


* A service proxy implementation monitors the set of Service and
EndpointSlice objects, and programs the data plane to route
service traffic to its backends, by using operating system or
cloud provider APIs to intercept or rewrite packets.
* [_Network plugins_](#network-plugins) - sets up IP addressing for pods and their containers, and allow pods to communicate
even when the source pod and destination pod are running on different nodes. Different network plugins achieve this in
different ways with examples including tunneling or IP routing.

* The [Gateway](/docs/concepts/services-networking/gateway/) API
(or its predecessor, [Ingress](/docs/concepts/services-networking/ingress/))
allows you to make Services accessible to clients that are outside the cluster.
Processes with a pod, such as the processes within Pod 1, can communicate automatically. Kubernetes
and the container runtime provide no special support as these processes all see a common local
network within the container sandbox.

* A simpler, but less-configurable, mechanism for cluster
ingress is available via the Service API's
[`type: LoadBalancer`](/docs/concepts/services-networking/service/#loadbalancer),
when using a supported {{< glossary_tooltip term_id="cloud-provider">}}.
You can also have connectivity between containers running on two or more different pods on the same node; for example
Pod 7 communicating with Pod 1, with both Pods (and their containers) running on Node 1. The network plugin(s)
that you deploy are responsible for the routes or other means to make sure that
these packets arrive at the right destination.

* [NetworkPolicy](/docs/concepts/services-networking/network-policies) is a built-in
Kubernetes API that allows you to control traffic between pods, or between pods and
the outside world.
In the cross-node case, you have container communications between pods on nodes connected
via the cluster network. In the example above, Pod 7 on Node 1 can talk to Pod 21 on Node 2.

In older container systems, there was no automatic connectivity
between containers on different hosts, and so it was often necessary
to explicitly create links between containers, or to map container
ports to host ports to make them reachable by containers on other
hosts. This is not needed in Kubernetes; Kubernetes's model is that
pods can be treated much like VMs or physical hosts from the
perspectives of port allocation, naming, service discovery, load
balancing, application configuration, and migration.
{{< note >}}
The network model permits all pods to talk to all other pods on the cluster. However, you might implement policies in your cluster to limit what pods can talk to other pods.
{{< /note >}}

The network model describes how pods and their associated pod IPs can integrate with the larger network to support
container networking.

[comment]: <> (All diagrams.net figures are available at: https://drive.google.com/drive/folders/1MPOeuJ3wTzptutZX_6GKpLK8ljnojKE8?usp=sharing)

[comment]: <> (good talk on K8 network models at https://www.cncf.io/wp-content/uploads/2020/08/CNCF_Webinar_-Kubernetes_network_models.pdf)

Kubernetes IP addresses exist at the Pod scope. For example, on Linux, containers
within a Pod share their network namespaces - including their IP address, and any
network address from a lower layer, such as a MAC address.
This means that containers within a Pod can all reach each other's ports on
`localhost`. This also means that containers within a Pod must coordinate port
usage (the same way that different processes on a physical server need to coordinate
port use. This model, as used in Kubernetes, is called the _IP-per-pod_ model.

How this is implemented is a detail of the particular container runtime in use.

It is possible to request and configure ports on the node itself (named _host ports_),
that forward to a port on your Pod.
The Pod itself is not aware of the existence or non-existence of host ports.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, a lot of the existing text is specifically trying to explain Kubernetes networking to people who are assuming a Docker-like networking model, but it never explicitly says this. Contrariwise, we don't have any good explanation of how Kubernetes networking is different from typical VM networking, which is probably more relevant to more newcomers these days. It would be great to have small sections explicitly comparing Kubernetes networking to (a) traditional host networking, (b) Docker networking, (c) VM (eg OpenStack) networking. (I'm not sure if we need to talk about non-Kubernetes cloud networking since that generally tries to look like traditional host networking?)

(Also, the original text "called host ports" feels much more idiomatic than "named host ports" to me here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have small sections explicitly comparing Kubernetes networking to (a) traditional host networking, (b) Docker networking, (c) VM (eg OpenStack) networking. (I'm not sure if we need to talk about non-Kubernetes cloud networking since that generally tries to look like traditional host networking?)

I agree, but again I'm looking to find the minimal diff from what we have to what we're willing to merge. The perfect is the enemy of the published.


## Networking integrations and customizations

Only a few parts of this model are implemented by Kubernetes itself.
For the other parts, Kubernetes defines the APIs, but the
corresponding functionality is provided by external components, some
of which are optional:

* Pod network namespace setup is handled by system-level software implementing the
[Container Runtime Interface](/docs/concepts/architecture/cri.md).

* The pod network itself is managed by a
[pod network implementation](/docs/concepts/cluster-administration/addons/#networking-and-network-policy).
On Linux, most container runtimes use the
Expand All @@ -86,20 +104,84 @@ of which are optional:
network implementations instead use their own service proxy that
is more tightly integrated with the rest of the implementation.

* NetworkPolicy is generally also implemented by the pod network
implementation. (Some simpler pod network implementations don't
implement NetworkPolicy, or an administrator may choose to
configure the pod network without NetworkPolicy support. In these
cases, the API will still be present, but it will have no effect.)
* Network policy (and the optional NetworkPolicy API) is commonly also implemented
by the pod network implementation.
(Some simpler pod network implementations don't implement NetworkPolicy, or an
administrator may choose to configure the pod network without NetworkPolicy support. In these
cases, the NetworkPolicy API will still be present in your cluster, but it will have no effect.)

* (On Linux), Pod network namespace setup is handled by system-level software implementing the
[Container Runtime Interface](/docs/concepts/architecture/cri/)

* There are many [implementations of the Gateway API](https://gateway-api.sigs.k8s.io/implementations/),
some of which are specific to particular cloud environments, some more
focused on "bare metal" environments, and others more generic.

* The old [Ingress](/docs/concepts/services-networking/ingress/) API also has many
implementations, including many third party integrations.

## Network plugins

[Network plugins](/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/) set up IP
addressing for Pods and their containers, and allow pods to communicate even when the source Pod and
destination Pod are running on different nodes. Different network plugins achieve this in different ways
with examples including tunneling or IP routing.

{{< note >}}
Network plugins are also known as _CNI_ or _CNI plugins_.
{{< /note >}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boo!

"Some people refer to network plugins as CNI plugins or just CNIs, but this is inaccurate, since CNI is just one of several APIs involved in Kubernetes networking."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other kind of network plugins can I use with Kubernetes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All network plugins use CNI, but they don't just use CNI.

People understand you're talking about network plugins when you say "CNI plugins" because CNI isn't used for anything except network plugins. But these days, most of what a network plugin does doesn't involve CNI.

But anyway, doesn't need to be fixed in this PR.


### Requirements {#networking-requirements}

Every [Pod](/docs/concepts/workloads/pods/) in your cluster gets its own unique cluster-wide IP address called a _pod IP_.

If you have deployed an [IPv4/IPv6 dual stack](/docs/concepts/services-networking/dual-stack/) cluster,
then you - or your network plugin(s) - must allocate pod IPs for IPv4 and IPv6 for each pod. This is
performed per [_address family_](https://www.iana.org/assignments/address-family-numbers/address-family-numbers.xhtml),
with one for IPv4 addresses and one for IPv6 addresses.

Kubernetes imposes the following requirements on any networking implementation (barring any intentional network
segmentation policies):

* Containers in the same pod can communicate with each other.
* Pods can communicate with all other Pods on the same or separate [nodes](/docs/concepts/architecture/nodes/)
without network address translation (NAT).
* Agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node.

## Host network

Kubernetes also supports pods running in the host network. Pods attached to the host network of a node can still
communicate with all pods on all nodes; again, without NAT.
Pods running in the host network do not require a working network plugin. For example, many network plugin
implementations operate as Pods, and the Pods that run the plugin are in host network mode so that they can start
before the cluster network is ready.

Traffic between nodes might go via the host network (potentially using _encapsulation_); different cluster network
designs make different choices here.

The kubelet needs to establish bidirectional communication with the API server (within the control plane),
so there must be an IP address in the host network for the kubelet to use.

## {{% heading "whatsnext" %}}

The [Connecting Applications with Services](/docs/tutorials/services/connect-applications-service/)
tutorial lets you learn about Services and Kubernetes networking with a hands-on example.
### Network plugins {#whats-next-network-plugins}


* CNI [Specification](https://www.cni.dev/docs/spec/)

* CNI [Documentation](https://www.cni.dev/docs/)

* [Reference plugins](https://www.cni.dev/plugins/current/#reference-plugins)

* [Introduction to CNI](https://youtu.be/YjjrQiJOyME) (video)

* [CNI deep dive](https://youtu.be/zChkx-AB5Xc) (video)

### Cluster networking

For an administrative perspective on networking for your cluster, read
[Cluster Networking](/docs/concepts/cluster-administration/networking/).

### More pages in this section

[Cluster Networking](/docs/concepts/cluster-administration/networking/) explains how to set
up networking for your cluster, and also provides an overview of the technologies involved.
Read the other pages in this section of the Kubernetes documentation.
4 changes: 4 additions & 0 deletions content/en/docs/images/k8net-PodSameHost04.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions content/en/docs/images/k8s-api-to-kubelet-example.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions content/en/docs/images/k8s-localhost-02.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,486 changes: 1,486 additions & 0 deletions content/en/docs/images/k8s-net-model-arch.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions content/en/docs/images/k8s-net-model-arch2a.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions content/en/docs/images/k8s-net-model-intro2.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions content/en/docs/images/k8s-net-phys-net.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions content/en/docs/images/k8s-net-vert-overlay-big2.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions content/en/docs/images/k8s-net-virtual-overlay.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions content/en/docs/reference/glossary/kube-proxy.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,5 @@ maintains network rules on nodes. These network rules allow network
communication to your Pods from network sessions inside or outside of
your cluster.

kube-proxy uses the operating system packet filtering layer if there is one
and it's available. Otherwise, kube-proxy forwards the traffic itself.
To actually forward traffic, kube-proxy uses operating system packet filtering
layers such as nftables or iptables.
20 changes: 20 additions & 0 deletions content/en/docs/reference/glossary/network-namespace.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
title: Network namespace
id: network-namespace
date: 2024-12-12
short_description: >
Linux mechanism to provide custom networking to a subset of processes.

aka:
tags:
- networking
---
A form of isolation used on Linux, where different processes (such as in containers) see a
different set of network interfaces and configuration than the host system.

<!-- more -->

The host system is typically represented by a root network namespace, which is often what
network plugins use to set up connectivity between nodes (and between Pods on those nodes).

A network namespace is not the same as a Kubernetes {{< glossary_tooltip term_id="namespace" text="namespace">}}.