Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IP control loop #185

Merged
merged 19 commits into from
Apr 13, 2022
Merged

Conversation

maiqueb
Copy link
Collaborator

@maiqueb maiqueb commented Feb 4, 2022

Introduce an IP reconciliation loop for whereabouts.

The controller monitor pod deletion events, and for every deleted pod it:

  • reads the pod's network status annotations
  • read the network attachment for every network referenced in the status
  • extract the associated IP range of the attachment
  • read the corresponding IP pool
  • delete all existing attachments on that pool for the pod being deleted

The controller is unit tested, and e2e tests are introduced to check for stale IP addresses for both pods and stateful sets.

It is updating the allocations in the datastore via the whereabouts API, which uses leader election to allow concurrent updates to the IP pools.

It only reads the Pods / IPPools / Net-Attach-Defs from the informer cache, in order to minimize the load of the Kubernetes API.

Depends-on: #202
Fixes: #182

@maiqueb
Copy link
Collaborator Author

maiqueb commented Feb 14, 2022

Rebased latest master & fixed vendoring in this forced push.

@coveralls
Copy link

coveralls commented Feb 14, 2022

Pull Request Test Coverage Report for Build 2155322022

  • 424 of 511 (82.97%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+13.4%) to 53.88%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/reconciler/controlloop/entity_generators.go 91 93 97.85%
pkg/api/whereabouts.cni.cncf.io/v1alpha1/register.go 16 22 72.73%
pkg/config/config.go 31 51 60.78%
pkg/reconciler/controlloop/dummy_controller.go 79 102 77.45%
pkg/reconciler/controlloop/pod.go 207 243 85.19%
Totals Coverage Status
Change from base Build 2070941103: 13.4%
Covered Lines: 861
Relevant Lines: 1598

💛 - Coveralls

@crandles
Copy link
Collaborator

controller-runtime could be leveraged for this, as far as tooling for creating a controller with informers

@maiqueb maiqueb force-pushed the ip-control-loop branch 5 times, most recently from cab96fd to aafe782 Compare March 1, 2022 13:56
@maiqueb
Copy link
Collaborator Author

maiqueb commented Mar 1, 2022

controller-runtime could be leveraged for this, as far as tooling for creating a controller with informers

I've started a discussion item in #205 for this. I would definitely appreciate if you voiced your opinion there @crandles :)

@dougbtv
Copy link
Member

dougbtv commented Mar 1, 2022

fwiw, I went to go build it and run it locally and stopped when I ran into this index-out-of-bounds error when running ./hack/test-go.sh, https://pasteall.org/BkLq#L17,79 (don't feel like you have to debug it, but, I slowed down when I got here)

nevermind (it was an environmental problem)

@maiqueb maiqueb force-pushed the ip-control-loop branch 2 times, most recently from 89e4c10 to a9a8e02 Compare March 4, 2022 17:33
maiqueb added 9 commits March 25, 2022 09:37
Signed-off-by: Miguel Duarte Barroso <[email protected]>
Signed-off-by: Miguel Duarte Barroso <[email protected]>
The reconciler controller will have access to the whereabouts
configuration via a mount point. As such, we need a way to specify its
path.

Signed-off-by: Miguel Duarte Barroso <[email protected]>
The IP reconcile loop also requires the IP ranges in a normalized
format; as such, we export it into a function, which will be used in a
follow-up commit.

Signed-off-by: Miguel Duarte Barroso <[email protected]>
Currently whereabouts is only able to parse network configurations in
the strict [0] format - i.e. **do not accept** a plugin list - [1].

The `ip-control-loop` must recover the full plugin configuration, which
may be in the network configuration format.

This commit allows whereabouts to now understand both formats.

Furthermore, the current CNI release - v1.0.Z - removed the support for
[0], meaning that only the configuration list format is now supported
[2].

[0] - https://github.com/containernetworking/cni/blob/v0.8.1/SPEC.md#network-configuration
[1] - https://github.com/containernetworking/cni/blob/v0.8.1/SPEC.md#network-configuration-lists
[2] - https://github.com/containernetworking/cni/blob/master/SPEC.md#released-versions

Signed-off-by: Miguel Duarte Barroso <[email protected]>
Listen to pod deletion, and for every deleted pod, assure their IPs
are gone.

The rough algorithm goes like this:
  - for every network-status in the pod's annotations:
    - read associated net-attach-def from the k8s API
    - extract the range from the net-attach-def
    - find the corresponding IP pool
    - look for allocations belonging to the deleted pod
    - delete them using `IPManagement(..., types.Deallocate, ...)`

All the API reads go through the informer cache, which is kept updated
whenever the objects are updated on the API.

The dockerfiles are also updated, to ship this new binary.

Signed-off-by: Miguel Duarte Barroso <[email protected]>
This would leave the `ip-control-loop` as the reconciliation tool.

Signed-off-by: Miguel Duarte Barroso <[email protected]>
This commit adds a unit where it is checked that the pod deletion leads
to the cleanup of a stale IP address.

This commit features the automatic provisioning of the controller informer cache
with the data present on the fake clientset tracker (the "fake" datastore).

This way, users can just create the client with provisioned data, and
that'll trickle down to the informer cache of the pod controller.

Because the `network-attachment-definitions` resources feature dashes,
the heuristic function that guesses - yes, guesses. very deterministic
... - the name of the resource can't be used - [0]. As such, it was
needed to create an alternate `newFakeNetAttachDefClient` where it is
possible to specify the correct resource name.

[0] - https://github.com/k8snetworkplumbingwg/network-attachment-definition-client/blob/2fd7267afcc4d48dfe6a8cd756b5a08bd04c2c97/vendor/k8s.io/client-go/testing/fixture.go#L331

Signed-off-by: Miguel Duarte Barroso <[email protected]>
The helper files are tagged with the `test` build tag, to prevent them
from being shipped on the production code binary.

Signed-off-by: Miguel Duarte Barroso <[email protected]>
@dougbtv
Copy link
Member

dougbtv commented Apr 1, 2022

Hey Miguel -- I attempted to use this interactively and overall -- I had great luck, and I'm really pleased.

One question I have is... I (by accident) had a pod running on my system which was not using whereabouts, but... the control loop appears to be picking it up?

E.g. Here's a snippet from the logs...

2022-04-01T18:51:40Z [verbose] deleted pod [default/samplepod]
2022-04-01T18:51:40Z [verbose] skipped net-attach-def for default network
2022-04-01T18:51:40Z [debug] pod's network status: {Name:default/macvlan-conf Interface:net1 IPs:[192.168.1.200] Mac:02:dc:79:11:4f:a6 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil>}
2022-04-01T18:51:40Z [verbose] the NAD's config: {{ "cniVersion": "0.3.0", "type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type": "host-local", "subnet": "192.168.1.0/24", "rangeStart": "192.168.1.200", "rangeEnd": "192.168.1.216", "routes": [ { "dst": "0.0.0.0/0" } ], "gateway": "192.168.1.1" } }}
2022-04-01T18:51:40Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2022-04-01T18:51:40Z [verbose] result of garbage collecting pods: failed to create an IPAM configuration for the pod default/samplepod iface default/macvlan-conf: invalid CIDR : invalid CIDR address: 
2022-04-01T18:51:40Z [verbose] re-queuing IP address reconciliation request for pod default/samplepod; retry #: 0

Are we filtering down to the right pods with the controller? Or did I do something... weird (very well could be)

@maiqueb
Copy link
Collaborator Author

maiqueb commented Apr 4, 2022

Hey Miguel -- I attempted to use this interactively and overall -- I had great luck, and I'm really pleased.

One question I have is... I (by accident) had a pod running on my system which was not using whereabouts, but... the control loop appears to be picking it up?

E.g. Here's a snippet from the logs...

2022-04-01T18:51:40Z [verbose] deleted pod [default/samplepod]
2022-04-01T18:51:40Z [verbose] skipped net-attach-def for default network
2022-04-01T18:51:40Z [debug] pod's network status: {Name:default/macvlan-conf Interface:net1 IPs:[192.168.1.200] Mac:02:dc:79:11:4f:a6 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil>}
2022-04-01T18:51:40Z [verbose] the NAD's config: {{ "cniVersion": "0.3.0", "type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type": "host-local", "subnet": "192.168.1.0/24", "rangeStart": "192.168.1.200", "rangeEnd": "192.168.1.216", "routes": [ { "dst": "0.0.0.0/0" } ], "gateway": "192.168.1.1" } }}
2022-04-01T18:51:40Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2022-04-01T18:51:40Z [verbose] result of garbage collecting pods: failed to create an IPAM configuration for the pod default/samplepod iface default/macvlan-conf: invalid CIDR : invalid CIDR address: 
2022-04-01T18:51:40Z [verbose] re-queuing IP address reconciliation request for pod default/samplepod; retry #: 0

Are we filtering down to the right pods with the controller? Or did I do something... weird (very well could be)

I'm honestly unsure on how ( if ?...) that can be done. Is there something on the pod that indicates it as a whereabouts pod ? Can I filter it somehow ? AFAIU, the best I can do is check if the pod's network IPAM configuration is of type whereabouts .
But doing that requires accessing the informer.

Do you have any suggestion on how to implement your proposal ? I agree it would be better to only send to the queue stuff we knew were whereabouts related.

@maiqueb maiqueb requested a review from dougbtv April 8, 2022 13:34
maiqueb added 9 commits April 12, 2022 15:29
Using a queue allows us to re-queue errors.

Signed-off-by: Miguel Duarte Barroso <[email protected]>
Adds two new events related to garbage collection of the whereabouts IP
addresses:
  - when an IP address is garbage collected
  - when a cleanup operation fails and is not re-queued

The former event looks like:
```
116s        Normal    IPAddressGarbageCollected   pod/macvlan1-worker1 \
            successful cleanup of IP address [192.168.2.1] from network \
            whereabouts-conf
```

The latter event looks like:
```
10s         Warning    IPAddressGarbageCollectionFailed    failed to garbage \
            collect addresses for pod default/macvlan1-worker1
```

Signed-off-by: Miguel Duarte Barroso <[email protected]>
Signed-off-by: Miguel Duarte Barroso <[email protected]>
These ugly tests do not cleanup after themselves; this way, the golang
based tests (which **do** cleanup after themselves) will not be impacted by
these left-overs.

Signed-off-by: Miguel Duarte Barroso <[email protected]>
Check the event thrown when a request is dropped from the queue, and
assure reconciling an allocation is impossible without having access to
the attachment configuration data.

Signed-off-by: Miguel Duarte Barroso <[email protected]>
Issue [0] reports an error when a pod associated to a `StatefulSet`
whose IPPool is already full is deleted. According to it, the new pod -
scheduled by the `StatefulSet` - cannot run because the IPPool is
already full, and the old pod's IP cannot be garbage collected because
we match by pod reference - and the "new" pod is stuck in `creating`
phase.

[0] - k8snetworkplumbingwg#182

Signed-off-by: Miguel Duarte Barroso <[email protected]>
The ip reconcile loop only requires the pod metadata and its network
status annotatations to garbage collect the stale IP addresses.

As such, we remove the status and spec parameters from the pod before
queueing it.

Signed-off-by: Miguel Duarte Barroso <[email protected]>
@dougbtv
Copy link
Member

dougbtv commented Apr 12, 2022

re:

but... the control loop appears to be picking it up?

...I think this was user error on my part after all 👍

Copy link
Member

@dougbtv dougbtv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you Miguel!

@dougbtv dougbtv merged commit 59f1052 into k8snetworkplumbingwg:master Apr 13, 2022
nicklesimba pushed a commit to nicklesimba/whereabouts that referenced this pull request Apr 20, 2022
* build: generate ip pool clientSet/informers/listers

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* vendor: update vendor stuff

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* build: vendor net-attach-def-client types

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* config: look for the whereabouts config file in multiple places

The reconciler controller will have access to the whereabouts
configuration via a mount point. As such, we need a way to specify its
path.

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* reconcile-loop: requires the IP ranges in normalized format

The IP reconcile loop also requires the IP ranges in a normalized
format; as such, we export it into a function, which will be used in a
follow-up commit.

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* config: allow IPAM config parsing from a NetConfList

Currently whereabouts is only able to parse network configurations in
the strict [0] format - i.e. **do not accept** a plugin list - [1].

The `ip-control-loop` must recover the full plugin configuration, which
may be in the network configuration format.

This commit allows whereabouts to now understand both formats.

Furthermore, the current CNI release - v1.0.Z - removed the support for
[0], meaning that only the configuration list format is now supported
[2].

[0] - https://github.com/containernetworking/cni/blob/v0.8.1/SPEC.md#network-configuration
[1] - https://github.com/containernetworking/cni/blob/v0.8.1/SPEC.md#network-configuration-lists
[2] - https://github.com/containernetworking/cni/blob/master/SPEC.md#released-versions

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* reconcile-loop: add a controller

Listen to pod deletion, and for every deleted pod, assure their IPs
are gone.

The rough algorithm goes like this:
  - for every network-status in the pod's annotations:
    - read associated net-attach-def from the k8s API
    - extract the range from the net-attach-def
    - find the corresponding IP pool
    - look for allocations belonging to the deleted pod
    - delete them using `IPManagement(..., types.Deallocate, ...)`

All the API reads go through the informer cache, which is kept updated
whenever the objects are updated on the API.

The dockerfiles are also updated, to ship this new binary.

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* e2e tests: remove manual cluster reconciliation

This would leave the `ip-control-loop` as the reconciliation tool.

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* unit tests: assure stale IPAllocation cleanup

This commit adds a unit where it is checked that the pod deletion leads
to the cleanup of a stale IP address.

This commit features the automatic provisioning of the controller informer cache
with the data present on the fake clientset tracker (the "fake" datastore).

This way, users can just create the client with provisioned data, and
that'll trickle down to the informer cache of the pod controller.

Because the `network-attachment-definitions` resources feature dashes,
the heuristic function that guesses - yes, guesses. very deterministic
... - the name of the resource can't be used - [0]. As such, it was
needed to create an alternate `newFakeNetAttachDefClient` where it is
possible to specify the correct resource name.

[0] - https://github.com/k8snetworkplumbingwg/network-attachment-definition-client/blob/2fd7267afcc4d48dfe6a8cd756b5a08bd04c2c97/vendor/k8s.io/client-go/testing/fixture.go#L331

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* unit tests: move helper funcs to other files

The helper files are tagged with the `test` build tag, to prevent them
from being shipped on the production code binary.

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* control loop, queueing: use a rate-limiting queue

Using a queue allows us to re-queue errors.

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* control loop: add IPAllocation cleanup related events

Adds two new events related to garbage collection of the whereabouts IP
addresses:
  - when an IP address is garbage collected
  - when a cleanup operation fails and is not re-queued

The former event looks like:
```
116s        Normal    IPAddressGarbageCollected   pod/macvlan1-worker1 \
            successful cleanup of IP address [192.168.2.1] from network \
            whereabouts-conf
```

The latter event looks like:
```
10s         Warning    IPAddressGarbageCollectionFailed    failed to garbage \
            collect addresses for pod default/macvlan1-worker1
```

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* e2e tests: check out statefulset scenarios

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* e2e tests: test different scale up/down order and instance deltas

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* ci: test e2e bash scripts last

These ugly tests do not cleanup after themselves; this way, the golang
based tests (which **do** cleanup after themselves) will not be impacted by
these left-overs.

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* ip control loop, unit tests: test negative scenarios

Check the event thrown when a request is dropped from the queue, and
assure reconciling an allocation is impossible without having access to
the attachment configuration data.

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* e2e tests: test fix for issue k8snetworkplumbingwg#182

Issue [0] reports an error when a pod associated to a `StatefulSet`
whose IPPool is already full is deleted. According to it, the new pod -
scheduled by the `StatefulSet` - cannot run because the IPPool is
already full, and the old pod's IP cannot be garbage collected because
we match by pod reference - and the "new" pod is stuck in `creating`
phase.

[0] - k8snetworkplumbingwg#182

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* ip-control-loop: strip pod before queueing it

The ip reconcile loop only requires the pod metadata and its network
status annotatations to garbage collect the stale IP addresses.

As such, we remove the status and spec parameters from the pod before
queueing it.

Signed-off-by: Miguel Duarte Barroso <[email protected]>

* reconcile-loop: focus on networks w/ whereabouts IPAM type

Signed-off-by: Miguel Duarte Barroso <[email protected]>

Replaced bash e2e test with Golang e2e test (k8snetworkplumbingwg#181)

Fixed code errors causing e2e tests to not compile

Signed-off-by: nicklesimba <[email protected]>

e2e tests: remove manual ip pool reconciliation

Signed-off-by: Miguel Duarte Barroso <[email protected]>

e2e tests: use the default namespace

Signed-off-by: Miguel Duarte Barroso <[email protected]>

e2e tests: read the IPAllocations from the correct namespace

Furthermore, harden the `isIPPoolAllocationsEmpty` function - currently, if the
IPPool does not exist, it will be created without allocations. When that
happens, the pool is indeed empty.

Signed-off-by: Miguel Duarte Barroso <[email protected]>

Changed variable name for readability (envVars *envVars changed to
testConfig *envVars)

Signed-off-by: nicklesimba <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ip-reconciler issue with StatefullSet cleanup
4 participants