-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explain Kubernetes network model in networking concept index #41419
base: main
Are you sure you want to change the base?
Explain Kubernetes network model in networking concept index #41419
Conversation
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
Here are some notes I collected from reviewers. Little fixes: "pods containing one more containers" should be "pods containing one or more containers" The diagram could use a little bit of clean up (icon alignment, icon text enlargement/scale up to make the icon text easier to read). Conceptual/clarity fixes: The title of the page includes Load Balancing but load balancing is not clearly explained in the writing on the page. The host network section, and perhaps the page in general, might use some further elaboration on how this writing is focused on the Kubernetes networking model concepts, and then a little para to put layer 7 configurations into context. I received some feedback that the language was too strong (could be softened with use of "may" and "sometimes", as there are edge cases), and could use some context around layer 7 configuration and how that relates to the network model in practice. While I understood it as is, I can see how a new learner might not understand the relationships between container configuration (exposing ports, defining container ports, binding listeners in software), and the possibilities of the network. There are several assumptions made in the description of the network model and we could clarify those assumptions. |
Thanks @jpegleg-k8s. Reviewers - is this good enough to merge in and iterate? |
L2 bridge | ||
: a (virtual) [layer 2](https://en.wikipedia.org/wiki/Data_link_layer) bridge enabling inter-pod connectivity on the same host. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the phrase you use below is "layer 2 bridge", not "L2 bridge"... (But also, we shouldn't be talking about layers at all here.)
If you're going to talk about bridges and encapsulation, it would be good to also mention "plugins that give pods IPs directly on the node network" (which we don't have a snappy one-word way to refer to that I can think of).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look at #39890 and existing reviews (maybe you did already). I'd like to talk as little as possible about encapsulation; maybe it's OK to yank any mention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #39890, the diagrams were introduced with phrases like "in this example", whereas that wording was lost in this PR. I think that's the problem. These diagrams do not show "the Kubernetes network model". They show one particular concrete implementation of the abstract Kubernetes network model.
If you want to fully explain the diagram to the reader, you may need to talk about L2 bridges and such. But your explanation is just about the diagram (or about what this particular unnamed plugin is doing), not about "Kubernetes networking" in general. Many network plugins use an L3 bridge instead of an L2 bridge, and some don't use a bridge at all and have an architecture which looks nothing at all like these diagrams. (eg, if you use the amazon-vpc-cni-k8s
or azure-container-networking
plugins then each pod would be directly connected to the same network as the node itself is).
Kubernetes imposes the following fundamental requirements on any networking | ||
implementation (barring any intentional network segmentation policies): | ||
kube-proxy | ||
: Part of Kubernetes, `kube-proxy` is optional component that you run on each Node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: Part of Kubernetes, `kube-proxy` is optional component that you run on each Node. | |
: Part of Kubernetes, `kube-proxy` is an optional component that you run on each Node. |
(Stylistically, I feel that that "Node" should be "node", since we're talking about the host itself, not the v1.Node
object.)
implementation (barring any intentional network segmentation policies): | ||
kube-proxy | ||
: Part of Kubernetes, `kube-proxy` is optional component that you run on each Node. | ||
The kube-proxy ensures that clients can connect to [Services](/docs/concepts/services-networking/service/), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we ever say "the kube-proxy".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We often say ”the kubelet”; see https://kubernetes.io/docs/contribute/style/style-guide/#use-code-style-for-kubernetes-command-tool-and-component-names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do often say "the kubelet", but we don't often say "the kube-proxy", just like we don't often say "the kube-apiserver". (We say "the API server" instead. And the corresponding phrase for kube-proxy would be "the Service proxy", but it looks like we basically never say that in the current docs...)
And it seems that we do sometimes currently say "the kube-proxy" in the docs anyway. 🤷♂️
: Part of Kubernetes, `kube-proxy` is optional component that you run on each Node. | ||
The kube-proxy ensures that clients can connect to [Services](/docs/concepts/services-networking/service/), | ||
including to any backend Pods that make up the Service. Clients might be other Pods, or they could be connecting from outside the cluster. | ||
Some network plugins provide their own alternative to kube-proxy, which means you don't need to install it when you use that particular plugin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to install kube-proxy when using some plugins that do use kube-proxy either (since they install it themselves). I'm not sure it makes sense to talk about installation in this doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but: I'm trying to find the smallest set of changes to this PR so that we can go from there-is-no-diagram to there-is-at-least something.
(Does this feedback block a merge?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some network plugins provide their own alternative to kube-proxy, which means you don't need to install it when you use that particular plugin. | |
Some network plugins provide their own alternative to kube-proxy. |
"IP-per-pod" model. | ||
{{< figure src="/docs/images/k8s-net-model-arch.svg" alt="Diagram of Kubernetes networking" class="diagram-large" caption="Figure 1. High-level example of a Kubernetes cluster, illustrating container networking." >}} | ||
|
||
The other K8s network components shown in figure consist of the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we abbreviate Kubernetes to K8s in the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we do, and one reason for that is to defend K8s as a trademark. We mostly write Kubernetes out longhand though.
* _Local pod networking_ - optional component that enables pod-to-pod communications in the same node. You might recognize | ||
this as a virtual layer 2 bridge (which is just one possible implementation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bridges are not part of the Kubernetes network model. It's true that a majority of Kubernetes plugins use some sort of bridge interface on each node (though not always an L2 bridge), but this is completely invisible at the level of "the Kubernetes network model". Unless you are debugging your cluster or developing a network plugin, then the Kubernetes network model is just that all pods can communicate (at L4) with all other pods, and that's it. You neither need to know, nor to care, exactly how the network plugin implements that.
(And if you are debugging your cluster or developing a network plugin then the diagram here is still not useful because in that case you need to know specifically how your own network plugin works, not how some theoretical network plugin works.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, Pods can communicate with other Pods at layer 3. Pods can observe packet-layer communications if they try hard enough. On Linux, you might need to add a capability for that to work.
Anyway, if I need to omit the mention of a bridge for this to merge, I can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, Pods can communicate with other Pods at layer 3
Some plugins might allow arbitrary layer 3 communication, but Kubernetes only guarantees that you can communicate with pods at L4 via TCP and UDP (and SCTP if the plugin supports it). There are no conformance requirements that pods be able to send or receive SCTP, IP multicast, IP broadcast, IPsec, ICMP pings, or any other arbitrary L3 traffic. (And there are good reasons for plugins to not allow arbitrary traffic between pods.)
Pods can observe packet-layer communications if they try hard enough.
Pods can observe the packets coming in and out of their own eth0
, given CAP_NET_ADMIN
. They have no ability to observe what happens on the other side of their eth0
, regardless of capabilities. (Well, if they're privileged
then they can install some eBPF and do whateverTF they want, but, you know.)
Anyway, if I need to omit the mention of a bridge for this to merge, I can.
as above, if you're explaining the diagram then go ahead and mention the bridge, but it should be clear that this is just how pod networking works in this example, not how it works always
You can also have connectivity between containers running on two or more different pods on the same node; for example | ||
Pod 7 communicating with Pod 1, with both Pods (and their containers) running on Node 1. The network plugin(s) | ||
that you deploy are responsible for the routes or other means to make sure that | ||
these packets arrive at the right destination. | ||
|
||
In the cross-node case, you have container communications between pods on nodes connected | ||
via the cluster network. In the example above, Pod 7 on Node 1 can talk to Pod 21 on Node 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all just boils down to "all pods can talk to all pods" which you already said
It is possible to request and configure ports on the node itself (named _host ports_), | ||
that forward to a port on your Pod; however, this is a very niche operation. | ||
How that forwarding is implemented is also a detail of the container runtime. | ||
The Pod itself is not aware of the existence or non-existence of host ports. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, a lot of the existing text is specifically trying to explain Kubernetes networking to people who are assuming a Docker-like networking model, but it never explicitly says this. Contrariwise, we don't have any good explanation of how Kubernetes networking is different from typical VM networking, which is probably more relevant to more newcomers these days. It would be great to have small sections explicitly comparing Kubernetes networking to (a) traditional host networking, (b) Docker networking, (c) VM (eg OpenStack) networking. (I'm not sure if we need to talk about non-Kubernetes cloud networking since that generally tries to look like traditional host networking?)
(Also, the original text "called host ports" feels much more idiomatic than "named host ports" to me here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have small sections explicitly comparing Kubernetes networking to (a) traditional host networking, (b) Docker networking, (c) VM (eg OpenStack) networking. (I'm not sure if we need to talk about non-Kubernetes cloud networking since that generally tries to look like traditional host networking?)
I agree, but again I'm looking to find the minimal diff from what we have to what we're willing to merge. The perfect is the enemy of the published.
Kubernetes networking addresses four concerns: | ||
- Containers within a Pod [use networking to communicate](/docs/concepts/services-networking/dns-pod-service/) via loopback. | ||
- Cluster networking provides communication between different Pods. | ||
- The [Service](/docs/concepts/services-networking/service/) API lets you | ||
[expose an application running in Pods](/docs/tutorials/services/connect-applications-service/) | ||
to be reachable from outside your cluster. | ||
- [Ingress](/docs/concepts/services-networking/ingress/) and [Gateway](https://gateway-api.sigs.k8s.io/) provide | ||
extra functionality specifically for exposing your applications, websites and APIs, usually to clients outside | ||
the cluster. Ingress and Gateway often use a load balancer to make that work reliably and at scale. | ||
- You can also use Services to | ||
[publish services only for consumption inside your cluster](/docs/concepts/services-networking/service-traffic-policy/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. So all of this text apparently already existed later on in the doc, but it's completely terrible. Especially, the [use networking to communicate]
and [publish services only for consumption inside your cluster]
links point to documents that have absolutely nothing to do with those phrases. (I'm guessing this must be a result of slow bit rot over the years as things got moved around between different documents.)
Even ignoring the bad links, this is not at all how I would summarize the goals of Kubernetes networking. I would say:
- The pod network allows pods to communicate with each other and with nodes, and allows pods to send traffic outside the cluster.
- Services provide persistent names and IPs for pods or groups of pods, with load balancing between them.
- Ingress and Gateway provide access to Services from outside the cluster.
- NetworkPolicy provides access control within the pod network.
(It's true that containers in a pod can communicate with each other via loopback, but I feel like that's more of an implementation detail of how pods work than it is a fact of "Kubernetes networking"... as you say below "Kubernetes and the container runtime provide no special support as these processes all see a common local network within the container sandbox.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship if you've the capacity, feel free to make your own PR here. You can use #39890 as a starting point or begin afresh.
I really do want to progress the work though. This effort started over a year ago and we have nothing merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is just moving text around, so I guess it's not making things worse...
(I'll try to find some time to dig into improving these docs...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that we are never merging a PR because it has stayed there for
too long. We always appreciate smaller PRs that fixes a particular problem or
improves a specific topic.
If we want to add some diagrams to this page, good, let's focus on getting the diagram
correct, readable and meaningful. If we want to revise the Service concept overview,
fine, let's try that in a self-contained PR. If we want to adjust the flow in a page,
okay, let's do it in a single PR.
With smaller PRs, we move forward step by step. There are always progresses along the way.
Adding 1600+ lines in a single PR with modifcations to 10 files?
No. I am strongly against it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that we are never merging a PR because it has stayed there for too long. We always appreciate smaller PRs that fixes a particular problem or improves a specific topic.
Yes
If we want to add some diagrams to this page, good, let's focus on getting the diagram correct, readable and meaningful. If we want to revise the Service concept overview, fine, let's try that in a self-contained PR. If we want to adjust the flow in a page, okay, let's do it in a single PR. With smaller PRs, we move forward step by step. There are always progresses along the way.
If you can write the PR description for the thing you'd like to be reviewing, that can help: we can use that to guide contributors to write it.
Adding 1600+ lines in a single PR with modifcations to 10 files?
We often do merge PRs that include more than one image, and I don't see grounds to change that.
|
||
{{< note >}} | ||
Network plugins are also known as _CNI_ or _CNI plugins_. | ||
{{< /note >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boo!
"Some people refer to network plugins as CNI plugins or just CNIs, but this is inaccurate, since CNI is just one of several APIs involved in Kubernetes networking."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other kind of network plugins can I use with Kubernetes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All network plugins use CNI, but they don't just use CNI.
People understand you're talking about network plugins when you say "CNI plugins" because CNI isn't used for anything except network plugins. But these days, most of what a network plugin does doesn't involve CNI.
But anyway, doesn't need to be fixed in this PR.
blah, the above comments are a rough draft and all out of order. I meant to click "cancel review" and start over but apparently I hit "submit review" instead? |
@danwinship did you have any more feedback / comments? |
What about https://deploy-preview-41419--kubernetes-io-main-staging.netlify.app/docs/concepts/services-networking/#kubernetes-network-model is not generic? If the change I need to make is to remove the text around “L2 bridge”, that feedback is OK. However, I do need feedback that helps me understand what change to make. I think this added diagram is generic already; if it isn't, I need to know how to change it so it is. |
You can't have a single diagram that is generic, unless you make it so abstract that it doesn't explain much. The existing diagram looks something like this:
but some network plugins do this instead:
(That is, each pod is connected directly to the "node network", and traffic from pod 1 to pod2 never passes through Node 1's host network namespace.) And it's still the case that "Every Pod ... gets its own unique cluster-wide IP address ... [and] Pods can communicate with all other Pods on the same or separate nodes without network address translation (NAT) ... [and] Agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node", so it's still a valid implementation of the Kubernetes network model. I guess the generic version would be to have a physical network connecting the nodes, and a "pod network" connecting the pods (and the nodes), with no indication of how the pod network and the physical network related to one another. Or, you can just say "the implementation shown in this diagram is similar to the one used by many plugins, but other implementations are possible". |
Thanks. I think it makes sense to omit the bridge. I can show that the node network and pod network are linked but that the way this happens is up to the network plugin. |
b72d96e
to
7f4b0ac
Compare
7f4b0ac
to
784cab0
Compare
784cab0
to
fd79d30
Compare
@sftim , should this pull request close or are you still working on the changes? |
This is still in progress, despite appearances - for example, I had a Zoom call on Monday that touched on how to move this forward. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
@sftim: I know you're awfully short on time, but just a quarterly check-in if this is still WIP? |
fd79d30
to
88e1218
Compare
I might get time this month to revisit the work here. |
@sftim Gentle reminder on this one. It seems to require a rebase. |
Mmm. I need to work out what intent we have around explaining Kubernetes networking; it's hard to get the PR right because I don't think we know the message that we - Kubernetes - actually want to convey. |
Thanks for the update! Would it help to have a conversation thread started around this somewhere in one of the Slack channels? |
I think there was a conversation. I've left this PR like this to track it as work-that-was-in-progress and because it's an area where user feedback suggests the docs are not very helpful. But I doubt I'll do work on this before October 2024. |
If you - @divya-mohan0209 - have capacity to foster that conversation, please do; it'd be very welcome. |
No problem, I'm happy to help out wherever I can! Which Slack channel would be a good place to start the conversation @sftim ? |
I think SIG Docs, and pop a link to that message into each of:
|
I might yet get (ie make) time to move this forward. |
filed #47903 with my attempt at rewriting the existing text without adding any new sections |
This needs revising post #47903 I'd like to keep this open so we don't lose track of the work done so far. |
I'll work on this. |
88e1218
to
7116fb7
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
7116fb7
to
58815e6
Compare
Split off and adapted from PR #39890
Once ready to merge, should help with #49278
@chrismetz09 did nearly all the work here; I'm proposing we adopt the bits we can merge right away.
/sig docs
/sig network
Most of the images being added are not yet being used. We can, IMO, merge them and then iterate.