All documentation in these guides assumes you have already downloaded both the Azure az
CLI tool and the aks-engine
binary tool. Follow the quickstart guide before continuing if you're creating a Kubernetes cluster using AKS Engine for the first time.
This guide assumes you already have a running cluster deployed using the aks-engine
CLI. For more details on how to do that see deploy or generate.
The aks-engine addpool
command can add a new node pool to an existing cluster. By specifing a new agentPoolProfile
configuration as a JSON file, aks-engine addpool
will add a node pool according to that configuration, and merge it into the pre-existing aks-engine-generated apimodel.json
. When used in combination with a newer version of the aks-engine
CLI compared to the version used to build the cluster originally, new node pools can be regularly added with the latest bits.
The example below will assume you have a cluster deployed, and that the API model originally used to deploy that cluster is stored at _output/<dnsPrefix>/apimodel.json
.
To add a new pool to the cluster you will run a command like:
$ aks-engine addpool --subscription-id <subscription_id> \
--resource-group mycluster --location <location> \
--api-model _output/mycluster/apimodel.json \
--node-pool ./pool.json
The above assumes that the new node pool specification is in the current working directory, and called pool.json
. Here's an example of what that file might look like:
{
"name": "pooladded",
"count": 5,
"vmSize": "Standard_D4s_v3",
"availabilityProfile": "VirtualMachineScaleSets",
"kubernetesConfig": {
"kubeletConfig": {
"--cloud-provider": "",
"--cloud-config": "",
"--azure-container-registry-config": ""
}
}
}
The above is a JSON object that conforms to the agentPoolProfile
specification, just like in the API model. The agentPoolProfile
spec is documented (here)[clusterdefinitions.md#agentpoolprofiles].
Some important considerations:
- The
"name"
value in a new pool must be unique; it may not be the same value as an existing node pool. - The
"availabilityProfile"
value in a new pool must match the value in the existing cluster node pools. That enforced homogeneity is an AKS Engine limitation with how its provisioned LoadBalancer resources manage backend pool membership across all nodes in the cluster for svc ingress routing. - The resultant, new Kubernetes node provisioned in your cluster is not entirely configured via its
agentPoolProfile
specification. It will also inherit certain properties from other configuration in the API model. Specifically, the version of Kubernetes may be modified in the API model JSON (not the JSON file expressing the new pool), and the new pool will be built running that version of Kubernetes. This can support experimenting with new versions of Kubernetes on new nodes (perhaps tainted or excluded from the cluster LoadBalancer) before rolling out that new version cluster-wide. - All new nodes in the adde pool will be added to the backend pool of the Azure LoadBalancer that serves cluster svc ingress traffic. In practice this means that these new nodes can run pods that support inbound svc traffic coming into the cluster.
Parameter | Required | Description |
---|---|---|
--subscription-id | yes | The subscription id the cluster is deployed in. |
--resource-group | yes | The resource group the cluster is deployed in. |
--location | yes | The location the resource group is in. |
--api-model | yes | Relative path to the generated API model for the cluster. |
--client-id | depends | The Service Principal Client ID. This is required if the auth-method is set to client_secret or client_certificate |
--client-secret | depends | The Service Principal Client secret. This is required if the auth-method is set to client_secret |
--certificate-path | depends | The path to the file which contains the client certificate. This is required if the auth-method is set to client_certificate |
--node-pool | yes | Path to JSON file expressing the agentPoolProfile spec of the new node pool. |
--auth-method | no | The authentication method used. Default value is client_secret . Other supported values are: cli , client_certificate , and device . |
--language | no | Language to return error message in. Default value is "en-us"). |
Similar to aks-engine update
, you may use the addpool
command to try out a new node configuration in your cluster without affecting existing nodes or production workloads (although if your new configuration is risky in any way you will want to taint those nodes so that no production workloads are scheduled, until you can validate the new configuration). The primary differences are:
- Use
addpool
when the configuration delta compared to an existing node pool is significant enough where it makes sense to organize that new configuration discretely in its own pool. Especially if the new pool will only serve a particular type of traffic (e.g., GPU or confidential compute), a dedicated pool should be used for easy, discrete scaling in response to the specific load requirements of the specific workloads it will run. - Use
addpool
when you want to run operational tests immediately, and also especially if you know the specific number of net new nodes to add, and you need them immediately. The primary operational difference betweenaddpool
andupdate
is thataddpool
actually adds new operational capacity to your cluster immediately, whereasupdate
merely changes the VMSS model, so that the next scale out operation renders a node with the new configuration.
If you're running a very large Kubernetes cluster, the one-node-at-a-time operation of aks-engine upgrade
will take many hours, even days, depending on the size of the cluster. Each one of those node deletions + node additions is subject to environmental failures, and so a deterministic upgrade can indeed take many days. Depending on your tolerance for temporary additional quota, you can upgrade your nodes more quickly, one pool at a time, and use your own validation criteria to inform the progression velocity through an entire cluster upgrade workflow. Let's demonstrate how that might work using a cluster with 3 node pools:
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master-26196714-0 Ready master 3m7s v1.18.8 10.255.255.5 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool1-26196714-vmss000000 Ready agent 3m7s v1.18.8 10.240.0.34 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool1-26196714-vmss000001 Ready agent 103s v1.18.8 10.240.0.65 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool1-26196714-vmss000002 Ready agent 3m7s v1.18.8 10.240.0.96 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool2-26196714-vmss000000 Ready agent 3m7s v1.18.8 10.240.1.181 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool2-26196714-vmss000001 Ready agent 3m v1.18.8 10.240.1.212 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool2-26196714-vmss000002 Ready agent 3m v1.18.8 10.240.1.243 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000000 Ready agent 3m7s v1.18.8 10.240.0.127 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000001 Ready agent 2m32s v1.18.8 10.240.0.158 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000002 Ready agent 3m7s v1.18.8 10.240.0.189 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000003 Ready agent 3m7s v1.18.8 10.240.0.220 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000004 Ready agent 3m7s v1.18.8 10.240.0.251 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000005 Ready agent 3m7s v1.18.8 10.240.1.26 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000006 Ready agent 3m7s v1.18.8 10.240.1.57 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000007 Ready agent 3m7s v1.18.8 10.240.1.88 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000008 Ready agent 3m7s v1.18.8 10.240.1.119 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000009 Ready agent 3m7s v1.18.8 10.240.1.150 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
Above we have a pool1
with 3 nodes, a pool2
with 3 nodes, and a pool3
with 10 nodes. Rather than run a single, continuous upgrade operation across all nodes in the cluster, let's add pools, then validate the new version, and then scale those new pools up so the original nodes can be cordoned, drained, and deleted.
Before we do that, though, let's upgrade the control plane first! You should always upgrade the control plane before your nodes. See the full upgrade docs here.
After our control plane has been updated to v1.19.1, we can proceed with a rolling upgrade of our nodes by gradually adding and validating new node pool. We'll use the command line jq
tool to create three new JSON files that we'll use to initiate 3 new aks-engine addpool
operations, derived from the original agentPoolProfile
specifications in the API model generated during cluster deployment:
$ jq -r '.properties.agentPoolProfiles[0] | .name = "newpool1"' < _output/kubernetes-westus2-1838/apimodel.json > newpool1.json
$ jq -r '.properties.agentPoolProfiles[1] | .name = "newpool2"' < _output/kubernetes-westus2-1838/apimodel.json > newpool2.json
$ jq -r '.properties.agentPoolProfiles[2] | .name = "newpool3"' < _output/kubernetes-westus2-1838/apimodel.json > newpool3.json
Because those were derived from the API, those new pools are configured with a count of 3, 3, and 10, respectively. Let's change all of the node counts to 1, because we don't necessarily need full node pool capacity to validate the new Kubernetes versions against:
$ jq -r '.count = 1' < newpool1.json > newpool1-1node.json && mv newpool1-1node.json newpool1.json
$ jq -r '.count = 1' < newpool2.json > newpool2-1node.json && mv newpool2-1node.json newpool2.json
$ jq -r '.count = 1' < newpool3.json > newpool3-1node.json && mv newpool3-1node.json newpool3.json
Our final configuration change before running aks-engine addpool
is updating the Kubernetes in the API model
$ jq -r '. | .properties.orchestratorProfile.orchestratorRelease = "1.19"' < _output/kubernetes-westus2-1838/apimodel.json > apimodel-1dot19.json
FrancisBookMS:aks-engine jackfrancis$ jq -r '. | .properties.orchestratorProfile.orchestratorVersion = "1.19.1"' < apimodel-1dot19.json > _output/kubernetes-westus2-1838/apimodel.json
$ grep orchestratorRelease -A 1 _output/kubernetes-westus2-1838/apimodel.json
"orchestratorRelease": "1.19",
"orchestratorVersion": "1.19.1",
We can now run addpool once per new pool to begin the process of validating v1.19.1 across our existing v1.18.8 cluster:
$ aks-engine addpool --subscription-id $TEST_AZURE_SUB_ID --api-model _output/kubernetes-westus2-1838/apimodel.json --node-pool newpool1.json --location westus2 --resource-group kubernetes-westus2-1838
WARN[0003] Any new nodes will have containerd version 1.3.7
INFO[0003] Starting ARM Deployment kubernetes-westus2-1838-1942811440 in resource group kubernetes-westus2-1838. This will take some time...
INFO[0158] Finished ARM Deployment (kubernetes-westus2-1838-1942811440). Succeeded
$ aks-engine addpool --subscription-id $TEST_AZURE_SUB_ID --api-model _output/kubernetes-westus2-1838/apimodel.json --node-pool newpool2.json --location westus2 --resource-group kubernetes-westus2-1838
WARN[0008] Any new nodes will have containerd version 1.3.7
INFO[0008] Starting ARM Deployment kubernetes-westus2-1838-25937475 in resource group kubernetes-westus2-1838. This will take some time...
INFO[0163] Finished ARM Deployment (kubernetes-westus2-1838-25937475). Succeeded
$ aks-engine addpool --subscription-id $TEST_AZURE_SUB_ID --api-model _output/kubernetes-westus2-1838/apimodel.json --node-pool newpool3.json --location westus2 --resource-group kubernetes-westus2-1838
WARN[0004] Any new nodes will have containerd version 1.3.7
INFO[0004] Starting ARM Deployment kubernetes-westus2-1838-1370618455 in resource group kubernetes-westus2-1838. This will take some time...
INFO[0174] Finished ARM Deployment (kubernetes-westus2-1838-1370618455). Succeeded
At this point we now have three new nodes running v1.19.1 on our cluster, one per new pool, which correlates with one new pool per pre-existing pool:
$ k get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master-26196714-0 Ready master 36m v1.18.8 10.255.255.5 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-newpool1-26196714-vmss000000 Ready agent 8m35s v1.19.1 10.240.2.18 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-newpool2-26196714-vmss000000 Ready agent 3m41s v1.19.1 10.240.2.49 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-newpool3-26196714-vmss000000 Ready agent 21s v1.19.1 10.240.2.80 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool1-26196714-vmss000000 Ready agent 36m v1.18.8 10.240.0.34 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool1-26196714-vmss000001 Ready agent 35m v1.18.8 10.240.0.65 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool1-26196714-vmss000002 Ready agent 36m v1.18.8 10.240.0.96 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool2-26196714-vmss000000 Ready agent 36m v1.18.8 10.240.1.181 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool2-26196714-vmss000001 Ready agent 36m v1.18.8 10.240.1.212 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool2-26196714-vmss000002 Ready agent 36m v1.18.8 10.240.1.243 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000000 Ready agent 36m v1.18.8 10.240.0.127 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000001 Ready agent 36m v1.18.8 10.240.0.158 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000002 Ready agent 36m v1.18.8 10.240.0.189 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000003 Ready agent 36m v1.18.8 10.240.0.220 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000004 Ready agent 36m v1.18.8 10.240.0.251 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000005 Ready agent 36m v1.18.8 10.240.1.26 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000006 Ready agent 36m v1.18.8 10.240.1.57 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000007 Ready agent 36m v1.18.8 10.240.1.88 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000008 Ready agent 36m v1.18.8 10.240.1.119 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000009 Ready agent 36m v1.18.8 10.240.1.150 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
At this point we would probably taint those three nodes, and then run validations against them (using the appropriate tolerations so that they were scheduled onto the desired nodes):
$ kubectl taint nodes k8s-newpool1-26196714-vmss000000 validating:NoSchedule
node/k8s-newpool1-26196714-vmss000000 tainted
$ kubectl taint nodes k8s-newpool2-26196714-vmss000000 validating:NoSchedule
node/k8s-newpool2-26196714-vmss000000 tainted
$ kubectl taint nodes k8s-newpool3-26196714-vmss000000 validating:NoSchedule
node/k8s-newpool3-26196714-vmss000000 tainted
Let's say we've validated the "pool1" replacement, which we've called "newpool1". Let's scale that pool out to match the original "pool1":
$ aks-engine scale --api-model _output/kubernetes-westus2-1838/apimodel.json --location westus2 --resource-group kubernetes-westus2-1838 --apiserver kubernetes-westus2-1838.westus2.cloudapp.azure.com --node-pool newpool1 --new-node-count 3
INFO[0003] found VMSS k8s-newpool1-26196714-vmss in resource group kubernetes-westus2-1838 that correlates with node pool newpool1
WARN[0003] Any new nodes will have containerd version 1.3.7
INFO[0003] Removing singlePlacementGroup property from [variables('newpool1VMNamePrefix')]
INFO[0003] Nodes in pool 'newpool1' before scaling:
NODE STATUS VERSION OS KERNEL
k8s-newpool1-26196714-vmss000000 Ready v1.19.1 Ubuntu 18.04.5 LTS 5.4.0-1025-azure
INFO[0003] Starting ARM Deployment kubernetes-westus2-1838-360281667 in resource group kubernetes-westus2-1838. This will take some time...
INFO[0230] Finished ARM Deployment (kubernetes-westus2-1838-360281667). Succeeded
INFO[0230] Nodes in pool 'newpool1' after scaling:
NODE STATUS VERSION OS KERNEL
k8s-newpool1-26196714-vmss000000 Ready v1.19.1 Ubuntu 18.04.5 LTS 5.4.0-1025-azure
k8s-newpool1-26196714-vmss000001 Ready v1.19.1 Ubuntu 18.04.5 LTS 5.4.0-1025-azure
k8s-newpool1-26196714-vmss000002 NotReady v1.19.1 Ubuntu 18.04.5 LTS 5.4.0-1025-azure
Note: you may also use the VMSS API directly (either via the az
CLI or the Azure portal web UI) to scale out the new pools. The advantage of using aks-engine scale
to do so is that you will get immediate signal if, for any reason, the new VMs did not come online successfully as Kubernetes nodes.
Now that we have equivalent node capacity for our new pool compared to our original pool (note: "capacity equivalence" may be a little more complicated if, as part of a process like this one, you change the VM SKU of the new pool as compared to the original pool; YMMV.) we can cordon + drain the original nodes and rely upon the Kubernetes layer to re-schedule workloads to the new nodes (note: this will require you to really ensure your workload scheduling configuration as pertains to the way your nodes are labeled, tainted, etc, makes sense and that your production workload specifications adhere to that configuration schema).
$ for node in "k8s-pool1-26196714-vmss000000 k8s-pool1-26196714-vmss000001 k8s-pool1-26196714-vmss000002"; do kubectl cordon $node; done
node/k8s-pool1-26196714-vmss000000 cordoned
node/k8s-pool1-26196714-vmss000001 cordoned
node/k8s-pool1-26196714-vmss000002 cordoned
$ for node in "k8s-pool1-26196714-vmss000000 k8s-pool1-26196714-vmss000001 k8s-pool1-26196714-vmss000002"; do kubectl drain --ignore-daemonsets $node; done
node/k8s-pool1-26196714-vmss000000 already cordoned
node/k8s-pool1-26196714-vmss000001 already cordoned
node/k8s-pool1-26196714-vmss000002 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/azure-cni-networkmonitor-z4tcw, kube-system/azure-ip-masq-agent-nmlnv, kube-system/blobfuse-flexvol-installer-zgjxg, kube-system/csi-secrets-store-jdmql, kube-system/csi-secrets-store-provider-azure-9d4j9, kube-system/kube-proxy-glrm6
node/k8s-pool1-26196714-vmss000000 drained
WARNING: ignoring DaemonSet-managed Pods: kube-system/azure-cni-networkmonitor-xhk8d, kube-system/azure-ip-masq-agent-lhj9p, kube-system/blobfuse-flexvol-installer-zdc4w, kube-system/csi-secrets-store-6zbx9, kube-system/csi-secrets-store-provider-azure-q2h6n, kube-system/kube-proxy-728sx
node/k8s-pool1-26196714-vmss000001 drained
WARNING: ignoring DaemonSet-managed Pods: kube-system/azure-cni-networkmonitor-mtx7c, kube-system/azure-ip-masq-agent-5p9lw, kube-system/blobfuse-flexvol-installer-cl9ls, kube-system/csi-secrets-store-provider-azure-vv8rb, kube-system/csi-secrets-store-xnjxn, kube-system/kube-proxy-rpfjt
node/k8s-pool1-26196714-vmss000002 drained
Note: the above example is rather brute-force. Depending on your operational reality, you may want to add some delay between draining each node. (cordon'ing all nodes at once actually makes sense, as you indeed want to stop any future scheduling onto those nodes all at the same time, once you have the required standby capacity, which in our example is the new, validated v1.19.1 nodes)
After all workloads have been drained, and moved over to the new nodes, you may delete the VMSS entirely:
$ az vmss delete -n k8s-pool1-26196714-vmss -g kubernetes-westus2-1838
$ echo $?
0
Now, the original "pool1" nodes are no longer participating in the cluster:
$ k get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master-26196714-0 Ready master 64m v1.18.8 10.255.255.5 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-newpool1-26196714-vmss000000 Ready agent 36m v1.19.1 10.240.2.18 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-newpool1-26196714-vmss000001 Ready agent 17m v1.19.1 10.240.2.111 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-newpool1-26196714-vmss000002 Ready agent 16m v1.19.1 10.240.2.142 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-newpool2-26196714-vmss000000 Ready agent 31m v1.19.1 10.240.2.49 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-newpool3-26196714-vmss000000 Ready agent 28m v1.19.1 10.240.2.80 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool2-26196714-vmss000000 Ready agent 64m v1.18.8 10.240.1.181 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool2-26196714-vmss000001 Ready agent 64m v1.18.8 10.240.1.212 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool2-26196714-vmss000002 Ready agent 64m v1.18.8 10.240.1.243 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000000 Ready agent 64m v1.18.8 10.240.0.127 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000001 Ready agent 63m v1.18.8 10.240.0.158 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000002 Ready agent 64m v1.18.8 10.240.0.189 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000003 Ready agent 64m v1.18.8 10.240.0.220 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000004 Ready agent 64m v1.18.8 10.240.0.251 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000005 Ready agent 64m v1.18.8 10.240.1.26 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000006 Ready agent 64m v1.18.8 10.240.1.57 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000007 Ready agent 64m v1.18.8 10.240.1.88 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000008 Ready agent 64m v1.18.8 10.240.1.119 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
k8s-pool3-26196714-vmss000009 Ready agent 64m v1.18.8 10.240.1.150 <none> Ubuntu 18.04.5 LTS 5.4.0-1025-azure docker://19.3.12
Final note: don't forget to remove the "pool1" agentPoolProfile
JSON object from your API model!
If you're running the AKS Engine cluster-autoscaler
addon, or running your own spec based on the upstream examples, you'll have a cluster-autoscaler
Deployment resource installed on your cluster. The examples below will assume that the cluster-autoscaler
componentry is installed in the kube-system
namespace.
First, you'll need to know the VMSS name of your new node pool. Here's how to do that using the az
CLI tool:
$ az vmss list -g kubernetes-westus2-1838 -o table
Name ResourceGroup Location Zones Capacity Overprovision UpgradePolicy
---------------------------- ------------------------ ---------- ------- ---------- --------------- ---------------
k8s-newpool-1838-vmss kubernetes-westus2-1838 westus2 1 2 1 False Manual
k8s-pool1-1838-vmss kubernetes-westus2-1838 westus2 1 2 1 False Manual
Now, edit the cluster-autoscaler
deployment:
$ kubectl edit deployment -n kube-system cluster-autoscaler
The above will open up the YAML spec in your default editor (e.g., vim
). What we want to do is to modify the cluster-autoscaler
runtime command arguments, so that your new VMSS node pool is enabled for cluster-autoscaler. Specifically, you want to look for one or more lines in the YAML file that look like this:
- --nodes=1:9:k8s-pool1-1838-vmss
And then add a new line below, using the identical indentation, with the new pool. So the changes should look like this:
- --nodes=1:9:k8s-pool1-1838-vmss
- --nodes=1:9:k8s-newpool1-1838-vmss
Again, refer to the cluster-api documentation for how to configure your VMSS node pools in Azure; the above example declares that cluster-autoscaler is enabled for the VMSS node pools k8s-pool1-26196714-vmss
and k8s-newpool1-26196714-vmss
running in your cluster, with a minimum node count of 1
, and a maximum node count of 9
, for both pools. After you save and exit from your editor, the cluster-autoscaler
deployment should delete the existing pod, and create a new one, with the modified changes.
If you're running cluster-autoscaler via the AKS Engine addon, and if you have explicitly configured the AKS Engine cluster-autoscaler
addon to a mode of Reconcile
, then you won't we able to simply edit the deployment spec on the cluster; instead you'll have to hop onto each control plane VM and manually edit the spec under /etc/kubernetes/addons/cluster-autoscaler.yaml
. This is not a common situation, and one that would only occur because of an explicit configuration in the API model, such as:
"addons": [
{
"name": "cluster-autoscaler",
"enabled": true,
"mode": "Reconcile"
}
]