When you want to upgrade support workloads in the cluster. This includes.
- Cert-manager
- Grafana
- Harbor
- Ingress Nginx
- K8up
- Loki
- Minio
- Prometheus
- Promtail
This document contains general instructions for how to upgrade support workloads, followed by specific instructions for each workload (linked above).
dplsh
instance authorized against the cluster. See Using the DPL Shell.
-
Identify the version you want to bump in the
environment/configuration
directory eg. for dplplat01 infrastructure/environments/dplplat01/configuration/versions.env. The file contains links to the relevant Artifact Hub pages for the individual projects and can often be used to determine both the latest version, but also details about the chart such as how a specific manifest is used. You can find the latest version of the support workload in the Version status sheet which itself is updated via the procedure described in the Update Upgrade status runbook. -
Consult any relevant changelog to determine if the upgrade will require any extra work beside the upgrade itself. To determine which version to look up in the changelog, be aware of the difference between the chart version and the app version. We currently track the chart versions, and not the actual version of the application inside the chart. In order to determine the change in
appVersion
between chart releases you can do a diff between releases, and keep track of theappVersion
property in the chartsChart.yaml
. Using using grafana as an example: https://github.com/grafana/helm-charts/compare/grafana-6.55.1...grafana-6.56.0. The exact way to do this differs from chart to chart, and is documented in the Specific producedures and tests below. -
Carry out any chart-specific preparations described in the charts update-procedure. This could be upgrading a Custom Resource Definition that the chart does not upgrade.
-
Identify the relevant task in the main Taskfile for upgrading the workload. For example, for cert-manager, the task is called
support:provision:cert-manager
and run the task withDIFF=1
, egDIFF=1 task support:provision:cert-manager
. -
If the diff looks good, run the task without
DIFF=1
, egtask support:provision:cert-manager
. -
Then proceeded to perform the verification test for the relevant workload. See the following section for known verification tests.
-
Finally, it is important to verify that Lagoon deployments still work. Some breaking changes will not manifest themselves until an environment is rebuilt, at which point it may subsequently fail. An example is the disabling of user snippets in the ingress-nginx controller v1.9.0. To verify deployments still work, log in to the Lagoon UI and select an environment to redeploy.
The project project versions its Helm chart together with the app itself. So, simply use the chart version in the following checks.
Cert Manager keeps Release notes for the individual minor releases of the project. Consult these for every upgrade past a minor version.
As both are versioned in the same repository, simply use the following link for looking up the release notes for a specific patch release, replacing the example tag with the version you wish to upgrade to.
https://github.com/cert-manager/cert-manager/releases/tag/v1.11.2
To compare two reversions, do the same using the following link:
https://github.com/cert-manager/cert-manager/compare/v1.11.1...v1.11.2
Commands
# Diff
DIFF=1 task support:provision:cert-manager
# Upgrade
task support:provision:cert-manager
Verify that cert-manager itself and webhook pods are all running and healthy.
task support:verify:cert-manager
Insert the chart version in the following link to see the release note.
https://github.com/grafana/helm-charts/releases/tag/grafana-6.52.9
The note will most likely be empty. Now diff the chart version with the current version, again replacing the version with the relevant for your releases.
https://github.com/grafana/helm-charts/compare/grafana-6.43.3...grafana-6.52.9
As the repository contains a lot of charts, you will need to do a bit of
digging. Look for at least charts/grafana/Chart.yaml
which can tell you the
app version.
With the app-version in hand, you can now look at the release notes for the grafana app itself.
Diff command
DIFF=1 task support:provision:grafana
Upgrade command
task support:provision:grafana
Verify that the Grafana pods are all running and healthy.
kubectl get pods --namespace grafana
Access the Grafana UI and see if you can log in. If you do not have a user
but have access to read secrets in the grafana
namespace, you can retrive the
admin password with the following command:
# Password for admin
UI_NAME=grafana task ui-password
# Hostname for grafana
kubectl -n grafana get -o jsonpath="{.spec.rules[0].host}" ingress grafana ; echo
Harbor has different app and chart versions.
An overview of the chart versions can be retrived from Github. the chart does not have a changelog.
Link for comparing two chart releases: https://github.com/goharbor/harbor-helm/compare/v1.10.1...v1.12.0
Having identified the relevant appVersions, consult the list of Harbor releases to see a description of the changes included in the release in question. If this approach fails you can also use the diff-command described below to determine which image-tags are going to change and thus determine the version delta.
Harbor is a quite active project, so it may make sense mostly to pay attention to minor/major releases and ignore the changes made in patch-releases.
Harbor documents the general upgrade procedure for non-kubernetes upgrades for minor versions on their website. This documentation is of little use to our Kubernetes setup, but it can be useful to consult the page for minor/major version upgrades to see if there are any special considerations to be made.
The Harbor chart repository has upgrade instructions as well. The instructions asks you to do a snapshot of the database and backup the tls secret. Snapshotting the database is currently out of scope, but could be a thing that is considered in the future. The tls secret is handled by cert-manager, and as such does not need to be backed up.
With knowledge of the app version, you can now update versions.env
as described
in the General Procedure section, diff to see the changes
that are going to be applied, and finally do the actual upgrade.
Diff command
DIFF=1 task support:provision:harbor
Upgrade command
task support:provision:harbor
First verify that pods are coming up
kubectl -n harbor get pods
When Harbor seems to be working, you can verify that the UI is working by
accessing https://harbor.lagoon.dplplat01.dpl.reload.dk/. The password for
the user admin
can be retrived with the following command:
UI_NAME=harbor task ui-password
If everything looks good, you can consider to deploying a site. One way to do this is to identify an existing site of low importance, and re-deploy it. A re-deploy will require Lagoon to both fetch and push images. Instructions for how to access the lagoon UI is out of scope of this document, but can be found in the runbook for running a lagoon task. In this case you are looking for the "Deploy" button on the sites "Deployments" tab.
When working with the ingress-nginx
chart we have at least 3 versions to keep
track off.
The chart version tracks the version of the chart itself. The charts appVersion
tracks a controller
application which dynamically configures a bundles nginx
.
The version of nginx
used is determined configuration-files in the controller.
Amongst others the
ingress-nginx.yaml.
Link for diffing two chart versions: https://github.com/kubernetes/ingress-nginx/compare/helm-chart-4.6.0...helm-chart-4.6.1
The project keeps a quite good changelog for the chart
Link for diffing two controller versions: https://github.com/kubernetes/ingress-nginx/compare/controller-v1.7.1...controller-v1.7.0
Consult the individual GitHub releases for descriptions of what has changed in the controller for a given release.
With knowledge of the app version, you can now update versions.env
as described
in the General Procedure section, diff to see the changes
that are going to be applied, and finally do the actual upgrade.
Diff command
DIFF=1 task support:provision:ingress-nginx
Upgrade command
task support:provision:ingress-nginx
The ingress-controller is very central to the operation of all public accessible parts of the platform. It's area of resposibillity is on the other hand quite narrow, so it is easy to verify that it is working as expected.
First verify that pods are coming up
kubectl -n ingress-nginx get pods
Then verify that the ingress-controller is able to serve traffic. This can be done by accessing the UI of one of the apps that are deployed in the platform.
Access eg. https://ui.lagoon.dplplat01.dpl.reload.dk/.
We can currently not upgrade to version 2.x of K8up as Lagoon is not yet ready
The Loki chart is versioned separatly from Loki. The version of Loki installed
by the chart is tracked by its appVersion
. So when upgrading, you should always
look at the diff between both the chart and app version.
The general upgrade procedure will give you the chart version, access the following link to get the release note for the chart. Remember to insert your version:
https://github.com/grafana/loki/releases/tag/helm-loki-5.5.1
Notice that the Loki helm-chart is maintained in the same repository as Loki itself. You can find the diff between the chart versions by comparing two chart release tags.
https://github.com/grafana/loki/compare/helm-loki-5.5.0...helm-loki-5.5.1
As the repository contains changes to Loki itself as well, you should seek out
the file production/helm/loki/Chart.yaml
which contains the appVersion
that
defines which version of Loki a given chart release installes.
Direct link to the file for a specific tag: https://github.com/grafana/loki/blob/helm-loki-3.3.1/production/helm/loki/Chart.yaml
With the app-version in hand, you can now look at the release notes for Loki to see what has changed between the two appVersions.
Last but not least the Loki project maintains a upgrading guide that can be found here: https://grafana.com/docs/loki/latest/upgrading/
Diff command
DIFF=1 task support:provision:loki
Upgrade command
task support:provision:loki
List pods in the loki
namespace to see if the upgrade has completed
successfully.
kubectl --namespace loki get pods
Next verify that Loki is still accessibel from Grafana and collects logs by logging in to Grafana. Then verify the Loki datasource, and search out some logs for a site. See the validation steps for Grafana for instructions on how to access the Grafana UI.
We can currently not upgrade MinIO without loosing the Azure blob gateway. see:
- https://blog.min.io/deprecation-of-the-minio-gateway/
- minio/minio#14331
- bitnami/charts#10258 (comment)
The kube-prometheus-stack
helm chart is quite well maintained and is versioned
and developed separately from the application itself.
A specific release of the chart can be accessed via the following link:
https://github.com/prometheus-community/helm-charts/releases/tag/kube-prometheus-stack-45.27.2
The chart is developed alongside a number of other community driven prometheus- related charts in https://github.com/prometheus-community/helm-charts.
This means that the following comparison between two releases of the chart
will also contain changes to a number of other charts. You will have to look
for changes in the charts/kube-prometheus-stack/
directory.
The Readme for the chart contains a good Upgrading Chart section that describes things to be aware of when upgrading between specific minor and major versions. The same documentation can also be found on artifact hub.
Consult the section that matches the version you are upgrading from and to. Be aware that upgrades past a minor version often requires a CRD update. The CRDs may have to be applied before you can do the diff and upgrade. Once the CRDs has been applied you are committed to the upgrade as there is no simple way to downgrade the CRDs.
Diff command
DIFF=1 task support:provision:prometheus
Upgrade command
task support:provision:prometheus
List pods in the prometheus
namespace to see if the upgrade has completed
successfully. You should expect to see two types of workloads. First a single
a single promstack-kube-prometheus-operator
pod that runs Prometheus, and then
a promstack-prometheus-node-exporter
pod for each node in the cluster.
kubectl --namespace prometheus get pods -l "release=promstack"
As the Prometheus UI is not directly exposed, the easiest way to verify that Prometheus is running is to access the Grafana UI and verify that the dashboards that uses Prometheus are working, or as a minimum that the prometheus datasource passes validation. See the validation steps for Grafana for instructions on how to access the Grafana UI.
The Promtail chart is versioned separatly from Promtail which itself is a part of Loki. The version of Promtail installed by the chart is tracked by its appVersion. So when upgrading, you should always look at the diff between both the chart and app version.
The general upgrade procedure will give you the chart version, access the following link to get the release note for the chart. Remember to insert your version:
https://github.com/grafana/helm-charts/releases/tag/promtail-6.6.0
The note will most likely be empty. Now diff the chart version with the current version, again replacing the version with the relevant for your releases.
https://github.com/grafana/helm-charts/compare/promtail-6.6.0...promtail-6.6.1
As the repository contains a lot of charts, you will need to do a bit of
digging. Look for at least charts/promtail/Chart.yaml
which can tell you the
app version.
With the app-version in hand, you can now look at the release notes for Loki (which promtail is part of). Look for notes in the Promtail sections of the release notes.
Diff command
DIFF=1 task support:provision:promtail
Upgrade command
task support:provision:promtail
List pods in the promtail
namespace to see if the upgrade has completed
successfully.
kubectl --namespace promtail get pods
With the pods running, you can verify that the logs are being collected seeking out logs via Grafana. See the validation steps for Grafana for details on how to access the Grafana UI.
You can also inspect the logs of the individual pods via
kubectl --namespace promtail logs -l "app.kubernetes.io/name=promtail"
And verify that there are no obvious error messages.