-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rollout automation #17
Open
squaremo
wants to merge
18
commits into
main
Choose a base branch
from
design-rollouts
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
squaremo
force-pushed
the
design-rollouts
branch
from
February 21, 2022 17:41
4447c20
to
5e405aa
Compare
Closed
squaremo
force-pushed
the
design-rollouts
branch
3 times, most recently
from
March 9, 2022 13:41
5401c1f
to
c8f78e6
Compare
squaremo
force-pushed
the
design-rollouts
branch
from
April 12, 2022 13:04
ce47826
to
fe32846
Compare
This adds a "work in progress" RFC on changing the layering to make rollouts possible. It's an RFC, rather than a design, because it proposes changes to the design -- if I can get it into a good shape (including, possibly, a reference implementation), then I'll update the design docs as a follow-up. Signed-off-by: Michael Bridgen <[email protected]>
Signed-off-by: Michael Bridgen <[email protected]>
Signed-off-by: Michael Bridgen <[email protected]>
The partition type was the locus for the module->cluster assignment, with the idea being that a rollout controller can change which clusters belong to which partitions to effect a rollout. However, it's not necessary to have an intermediate object for this relation -- clusters can simply be owned by modules directly, without losing generality. Since partitions don't pay their way, I've removed them from the design.
Using owner references will interact with Kubernetes' garbage collection. Even if it's possible to work around that with e.g., finalisers, it seems like a bad idea to subvert GC. It also doesn't help with atomicity, for which other workarounds are needed. Instead, use the object that already represents the assignment of modules to a cluster -- the RemoteAssemblage -- and change its meaning to suit. More shuffling may be necessary to make BootstrapModules fit too. Signed-off-by: Michael Bridgen <[email protected]>
Signed-off-by: Michael Bridgen <[email protected]>
After realising that assemblages represent the assignment of modules to clusters -- thus no need for another object -- a further realisation is that building a Rollout layer on top of modules makes modules essentially passive objects. Another way to look at it is that Modules may as well be the locus for rollout logic, since otherwise they do not have a job to do. Signed-off-by: Michael Bridgen <[email protected]>
This is just to indicate where the user is expected to interact with the system. It doesn't affect the technical bits of the design, but might help orient readers. Signed-off-by: Michael Bridgen <[email protected]>
kubebuilder create api --kind ProxyAsssemblage ... One change needed for rollouts is to have an "assemblage" object to be the place all the assigned bootstrap modules (syncs which get applied remotely from the control plane) get recorded for each cluster. A good name for that would be `RemoteAssemblage` (e.g., assemblage which is applied remotely) -- but that name is already used for the one that proxies to a remote cluster, which would be better-named `ProxyAssemblage`. To create myself some bother, I'm adding the new type and the rename at the same time. Signed-off-by: Michael Bridgen <[email protected]>
As part of the rename of RemoteAssemblage to ProxyAssemblage, this commit copies the implementation of the remote assemblage controller to the proxy assemblage controller, ports the tests for the former to the latter, and likewise swaps the type used in module_controller.go. The implementation for (repurposed) remote assemblages will be new, and need new tests. Signed-off-by: Michael Bridgen <[email protected]>
At present, bootstrap modules get expanded directly into Flux primitives (one GitRepository, and a Kustomization per cluster). Incremental rollouts could be implemented within this scheme; however, it would not help third party integrations much since they'd have to implement all of it themselves. Instead, I'm repurposing the RemoteAssemblage type -- the behaviour of which was moved to the better-named ProxyAssemblage -- to represent a set of syncs to be applied remotely to a cluster. In this commit: - change the definition of the RemoteAssemblage type so it contains syncs to apply remotely, rather than syncs to proxy (those were moved to ProxyAssemblage) - move the "expand to Flux primitives" code from the bootstrap module controller to the remote assemblage controller - implement the construction of remote assemblages in the bootstrap module controller - fix up type names (remote v proxy) in RBAC anontations and comments - adapt test code to the above changes The aim is to eventually put the commonly useful bits -- expansion to Flux primitives, and binding evaluation -- in the assemblage code, and the rollout logic in the module code. An integration or extension can then replace the module part by building on the assemblage part. Signed-off-by: Michael Bridgen <[email protected]>
When you create a cluster with Cluster API, it will typically take some minutes between the Cluster object (and its companions) appearing and the provider creating a secret containing a kubeconfig for the newly-provisioned cluster. If you have a bootstrap module already, the Flux primitives targeting the new cluster will fail in that period, possibly leading to a long back-off. There's at least a couple of places this can be mitigated: - only create a RemoteAssemblage when a cluster is marked ready; - only create Flux primitives when the required secret exists; Here, I've implemented the latter, since it is at a lower level and therefore will apply more broadly. Signed-off-by: Michael Bridgen <[email protected]>
The assemblage controller has a func for calculating the ready state of the Kustomization objects it created. This is useful for the remote assemblage controller too, which is in a different layer (module rather than assemblage) -- so, recreate it in pkg/sync to be available to both layers. I changed the argument type; it seems like a calculation specific to the Flux Kustomize type. But if I need it for other things, perhaps it can be generalised, or just replaced with kstatus. Signed-off-by: Michael Bridgen <[email protected]>
- adapt the code for tracking assemblages and recording sync states and module summaries, from module_controller to bootstrapmodule_controller - use KustomizeReadyState from pkg/api/, consistently - try to get the RBAC annotations more accurate (again) Signed-off-by: Michael Bridgen <[email protected]>
It's expected (and useful) behaviour to recreate a controller-created object that is deleted out of band (e.g., by a pesky human). To make it work this way, it's usually only necessary for the controller to watch the objects in question, since a deletion will then trigger a reconciliation of the owner object. Signed-off-by: Michael Bridgen <[email protected]>
Previously, the remote assemblage controller was made to skip (and come back to) a cluster when its kubeconfig secret is missing. This accounts broadly for races in provisioning clusters, since ultimately its the secret that matters. However, with Cluster API providers its possible for a kubeconfig secret to exist before the cluster is ready to be used. This situation _also_ leads to long backoffs and apparent lack of responsiveness. To mitigate that, this commit adds a similar mechanism to the above, at another layer: clusters that are not marked as ready (in `.status.controlPlaneReady`) are skipped. This is at the module (and bootstrapmodule) level, because it's modules that react to clusters -- assemblages just refer to kubeconfig secrets. Signed-off-by: Michael Bridgen <[email protected]>
The ProxyAssemblage controller does not have the opportunity to be notified of changes in a downstream cluster (or at least, this would be tricky to arrange, especially at scale), so it won't in general see when its Assemblage counterpart changes status. A dumb but effective way to make sure updates are seen is to simply requeue each ProxyAssemblage to be re-examined after a small interval. This will be OK at moderate scale (I would guess O(100) clusters), since more worker threads can be configured to cope with polling. Larger scales may need a less brute force approach to be figured out. Signed-off-by: Michael Bridgen <[email protected]>
- fix a dependency loop in the demo with CRDs and their use - fix a typo in the enrol script - give some clues about manually cleaning up Signed-off-by: Michael Bridgen <[email protected]>
squaremo
force-pushed
the
design-rollouts
branch
from
April 13, 2022 09:22
fe32846
to
7755045
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds a "work in progress" RFC on changing the layering to make rollouts possible. It's an RFC, rather than a design, because it proposes changes to the design -- if I can get it into a good shape (including, possibly, a reference implementation), then I'll update the design docs as a follow-up.