From f28f4af153f02ad329ca3e9ac4190b6b4a518838 Mon Sep 17 00:00:00 2001 From: Katherine Stanley <11195226+katheris@users.noreply.github.com> Date: Thu, 14 Nov 2024 12:56:10 +0000 Subject: [PATCH] External Certificate Manager Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com> --- 087-external-certificate-manager.md | 196 ++++++++++++++++++++++++++++ 1 file changed, 196 insertions(+) create mode 100644 087-external-certificate-manager.md diff --git a/087-external-certificate-manager.md b/087-external-certificate-manager.md new file mode 100644 index 00000000..4e803362 --- /dev/null +++ b/087-external-certificate-manager.md @@ -0,0 +1,196 @@ +# External Certificate Manager + +This proposal aims to allow Strimzi users to use an external certificate manager, specifically [cert-manager](https://cert-manager.io/), to manage certificates. + +## Current situation + +There are two different categories of certificates that Strimzi handles: +* The term "cluster" refers to certificates that are issued for the Strimzi components: + * ZooKeeper nodes + * Kafka nodes + * Cluster, User and Topic operators + * Cruise Control + * Kafka Exporter +* The term "clients" refers to certificates that are issued for user applications using the User Operator, or through another external mechanism chosen by the user. + +For both categories, to provide a secure, TLS-enabled setup by default when deploying Kafka clusters, Strimzi integrated its own CA operations into the Cluster Operator. +The Cluster Operator accomplishes this by using openssl to generate self-signed root CA certificates and private keys which it then uses to directly sign end-entity (EE) certificates. +This CA certificate has zero pathlen, which means it cannot sign any intermediate CA. +A cluster CA and clients CA are generated. +These CAs are only used for Kafka clusters and a unique instance of each CA is used for each Kafka cluster. + +In addition to Strimzi fully managing the certificates as described above, there are options for users to partially manage the certificates: +* Users can [install and use their own CA certificate and private keys](https://strimzi.io/docs/operators/latest/deploying#installing-your-own-ca-certificates-str), instead of using the defaults generated by the Cluster Operator. + When using this option, both the CA certificate and private key must be provided, and Strimzi still issues the end-entity (EE) certificates that are presented by the components. +* Users can [provide custom listener certificates](https://strimzi.io/docs/operators/latest/deploying#proc-installing-certs-per-listener-str) for TLS encryption. + This option only affects how user applications connect to Kafka. + It does not change how the Strimzi components connect to Kafka, or how the Kafka brokers connect to each other. + +None of the existing options allow the certificate management to be done completely separately from Strimzi. + +## Motivation + +Strimzi's primary purpose is to provide a way to run Apache Kafka clusters on Kubernetes. +Although it is nice that it can manage certificates, it would be beneficial if the certificates could be managed by a dedicated certificate manager, such as [cert-manager](https://cert-manager.io/). +This is a feature that is often requested, especially because many organizations have specific compliance requirements with regard to certificates, for example: +* Requiring that CA private keys are not shared. +* Requiring that self-signed certificates cannot be used. + +## Proposal + +Strimzi will be updated to allow users to specify that certificates should be issued by an external certificate manager, rather than issued by the Cluster Operator. +This proposal will specifically describe how this would work for cert-manager, however the user API for configuration will be written in a way that does not prevent other external certificate managers being added in the future. + +The proposal makes a few assumptions: +* Strimzi will not be responsible for installing cert-manager, but we will document the versions of cert-manager that we have tested with. +* Strimzi will not be responsible for creating `Issuer` or `ClusterIssuer` custom resources. +* Strimzi will create `Certificate` custom resources and will allow the user to influence the contents of these resources by exposing options in the `Kafka` custom resource. +* Strimzi will not directly interact with the lower level `CertificateRequest` and `CertificateSigningRequests` custom resources. +* When Strimzi creates a `Certificate` custom resource, cert-manager will issue the certificate within a reasonable amount of time such that Strimzi can wait during the reconciliation. +* Users will provide to the Strimzi Cluster Operator the CA certificates it must trust for the current issuer via a Kubernetes Secret. + +### API + +The existing `spec.clusterCa` and `spec.clientsCa` fields will be extended to add a new property `certificateIssuer`: + +```yaml +spec: + clusterCa: + validityDays: # notBefore=now, notAfter=now + validityDays + generateCertificateAuthority: + generateSecretOwnerReference: + renewalDays: # days before notAfter when we should start renewal + certificateExpirationPolicy: + certificateIssuer: + type: # (1) + issuerRef: # (2) + name: + kind: + group: # cert-manager.io by default +``` + +1. If the `certificateIssuer` and `type` properties are not set it will default to `internal` and will use the existing behaviour, allowing backwards compatibility. + The option `cert-manager.io` will only be valid if `generateCertificateAuthority` is set to `false`. +2. The property `certificateIssuer.issuerRef` will only be used by Strimzi if `certificateIssuer.type` is set to `cert-manager.io`. + The `name`, `kind`, and `group` properties will be copied over into the `Certificate` custom resource Strimzi creates. + +### User steps + +To make use of this new option the user will have to: + +1. Install cert-manager. +2. Create an `Issuer` or `ClusterIssuer` custom resource. +3. Create a `Secret` containing the CAs for Strimzi to trust. + Users can optionally use [trust-manager](https://cert-manager.io/docs/trust/trust-manager/) to create this Secret, but they are responsible for installing trust-manager, creating the `Bundle` CR and annotating the resulting Secret with the Strimzi cert annotation. +4. Create a `Kafka` resource with `clusterCa.certificateIssuer` and/or `clientsCa.certificateIssuer` configured. + +Notes: +* The `Secret` that contains the CAs will be the same `Secret` currently used, so either `-cluster-ca-cert` or `-clients-ca-cert`. + +### Handling trust rollout + +Similar to today Strimzi will use the notion of a "generation" to determine whether to roll the cluster to pick up changes in either the `-cluster-ca-cert` or `-clients-ca-cert`. +When the user creates the CA cert Secret they must add the `strimzi.io/ca-cert-generation` annotation to it. +If the user updates the Secret to change the certificates included they must increment the annotation to inform Strimzi it has changed. +Similar to today Strimzi will put the annotation on the pods (Kafka, ZooKeeper etc) to be able to spot when the generation has been changed. + +### Issuing certificates + +If the user has enabled this feature, when Strimzi needs to issue a certificate, instead of using the existing internal mechanism it will create a `Certificate` custom resource. +Strimzi will wait during the reconciliation loop for the `Certificate` status to indicate that the certificate has been issued before continuing. +When issuing cluster certificates (e.g for Kafka etc), once the certificate has been issued, Strimzi will annotate the cert-manager provided Secret with the `strimzi.io/server-cert-hash` annotation with the value being the hash of the certificate in the Secret. +Similar to today, Strimzi will also add this hash annotation to the pods to track whether they are mounting the latest version of the Secret. + +### Tracking changes to cluster end-entity certificates + +Cert-manager will be responsible for renewing all end-entity certificates. +When a certificate is renewed cert-manager will update the related Secret. + +For cluster certificates (e.g. for Kafka etc), Strimzi will track and handle these changes using the `strimzi.io/server-cert-hash` annotation. +During the reconciliation loop, even if all cluster end-entity certificates have been issued, Strimzi will patch the certificate Secrets with the correct `strimzi.io/server-cert-hash` annotation. +The value of this annotation can then be compared with the value on the pods to determine whether the pods need to be restarted to pick up a new Secret. + +For user certificates (issued by the User Operator), the user will be responsible for making sure their applications notice cert-manager renewing the certificates and are updated to use the new certificate. + +## Affected/not affected projects + +This affects the Cluster Operator and User Operator. + +## Compatibility + +This feature will be optional and not disabled by default. + +### Migrating to this feature + +To start using this feature in an existing Kafka cluster the user must: +1. Install cert-manager and create an `Issuer`. +2. Pause reconciliation for their Kafka cluster. +3. Update the `-cluster-ca-cert` and/or `-clients-ca-cert` Secrets to: + 1. contain the CA(s) for the `Issuer` (keeping the old CA cert in the Secret still) + 2. increment the `strimzi.io/ca-cert-generation` annotation +4. Update the `Kafka` resource to configure the `clusterCa.certificateIssuer` and/or `clientsCa.certificateIssuer` sections. +5. Resume reconciliation. + +When using this feature for the ClientsCa: +* On the next Cluster Operator reconciliation Strimzi will roll the Kafka pods to trust the new CA cert. +* On the next User Operator reconciliation Strimzi will create `Certificate` resources for all the existing `KafkaUser` custom resources. + +When using this feature for the ClusterCa: +* On the next reconciliation Strimzi will first roll the pods once to trust the new CA cert. +* Strimzi will create `Certificate` resources for all the components and wait for the certificates to be issued. +* Strimzi will roll the pods to use the new certificates. + +Once all the pods have been rolled the user can update the `-cluster-ca-cert` and/or `-clients-ca-cert` Secrets to remove the old CA cert. + +### Stopping using this feature + +To revert to user managed CAs the user will: +1. Pause reconciliation for their Kafka cluster. +2. Update the `-cluster-ca-cert` and/or `-clients-ca-cert` Secrets to: + 1. contain their public CA cert (keeping the old cert-manager one) + 2. increment the `strimzi.io/ca-cert-generation` annotation +3. Create the `-cluster-ca` and/or `-clients-ca` private key Secrets +4. Update the `Kafka` resource to change the `clusterCa.certificateIssuer` and/or `clientsCa.certificateIssuer` `type` to `internal`. +5. Resume reconciliation. + +When using this feature for the ClientsCa: +* On the next Cluster Operator reconciliation Strimzi will roll the Kafka pods to trust the new CA cert. +* On the next User Operator reconciliation Strimzi will issue new certificates for all the existing `KafkaUser` custom resources. + +When using this feature for the ClusterCa: +* On the next reconciliation Strimzi will first roll the pods once to trust the new CA cert. +* Strimzi will issue new certificates for all the components. +* Strimzi will roll the pods to use the new certificates. + +Once all the pods have been rolled the user can update the `-cluster-ca-cert` and/or `-clients-ca-cert` Secrets to remove the old CA cert. +The user is responsible for removing the old `Certificate` resources and uninstalling cert-manager. + +Notes: +* Today we do not document how to go from using user managed CAs to Strimzi managed CAs. + For this reason I have not included how to go from cert-manager CAs to Strimzi managed CAs. + +## Rejected alternatives + +### Letting Strimzi infer the CA cert to trust + +Certain issuers will include CA cert to trust in the Secret for a specific certificate. +Strimzi could use this cert instead of requiring the user to provide one. +However, this is not recommended. +On the cert-manager [website](https://cert-manager.io/docs/trust/) they explicitly state: +"When configuring the client you should independently choose and fetch the CA certificates that you want to trust. +Download the CA out of band and store it in a Secret or ConfigMap separate from the Secret containing the server's private key and certificate." +To keep to this best practice and also allow Strimzi to have the same behaviour for all issuers I have chosen to require the user to provide the CA certs to trust up-front. + +### Using lower-level cert-manager CRs + +Strimzi could keep control of when to renew/replace certificates/keys and instead use the lower-level custom resources such as `CertificateRequest`. +I chose not to do this since part of the motivation for this feature is to offload certificate management to a dedicated tool. + +### Strimzi only interacting with Secrets + +Strimzi could not interact with cert-manager custom resources at all and instead just deal with the resulting Secrets directly. +This could work for the ClientsCa, however we already provide the option for users to configure listener certificates, so there is no need for an alternative option. +For the ClusterCa the certificates needed are complex, since there are multiple different nodes and network connections. +It would be very complex for the user to hand-craft the right certificates, and would also restrict their ability to scale up the cluster, +since they would need to create the new certificates up front. +For these reasons it makes sense for Strimzi to create the `Certificate` custom resources.