automatically adjust pr envs resource request #560

ITViking · 2024-12-15T20:03:18Z

What does this PR do?

This adds a cronjob, which will run every 6 hours. It will take approximately pr-envs*1'ish minute to complete.
The job will adjust the redis, varnish, nginx and cli deployments to create pods with a set resource request, fitting for pr environments.

This is in effect now

Should this be tested by the reviewer and how?

Any specific requests for how the PR should be reviewed?

What are the relevant tickets?

https://reload.atlassian.net/jira/software/c/projects/DDFDRIFT/boards/464?selectedIssue=DDFDRIFT-264

ath88 · 2024-12-15T22:54:52Z

.../environments/dplplat01/configuration/vertical-pod-autoscaler/patch-cms-bnf-deployments.yaml

+                    kubectl patch deployment $deploy -n $ns \
+                      --type=json \
+                      -p="[{\"op\": \"replace\", \"path\": \"/spec/template/spec/containers/0/resources\", \"value\": {\"requests\": {\"cpu\": \"15m\", \"memory\": \"$memory_request\"}, \"limits\": {\"cpu\": \"200m\", \"memory\": \"$memory_limit\"}}}]"


Does the output here reveal if the deployment was changed or not? If so - you could skip the wait, if the resources had already been set.

that is a good point, will have to look into that

I left a comment, but there's more pressing matters to get at than this at the moment

ath88 · 2024-12-15T22:56:30Z

I like your approach. Could you turn it into a helm chart instead, so the relationship between the resources is clearly marked for someone inspecting the cluster?

.../environments/dplplat01/configuration/vertical-pod-autoscaler/patch-cms-bnf-deployments.yaml

This will run every 6 hours and ensure that as more pr-envs are being added, they are resource regulated a long the way. Right now they would not be regulated until next time the manuel script would be run

…envs

hypesystem

Looks very reasonable to me! Have you tested that it works in some limited capacity? Or are you awaiting approval before testing?

I have one concern which I will see if I can help with a solution for.

hypesystem · 2025-01-15T11:03:40Z

.../configuration/patch-pr-env-resource-requests/chart/templates/patch-cms-bnf-deployments.yaml

+        spec:
+          containers:
+          - name: patch-resources
+            image: bitnami/kubectl:latest


Pin to kubernetes version we know works with our setup.

kubectl may update or we may update cluster to create inconsistency unknowingly - so it's better to be able to control it.

hypesystem · 2025-01-15T11:14:56Z

.../configuration/patch-pr-env-resource-requests/chart/templates/patch-cms-bnf-deployments.yaml

+                  fi
+                done
+                echo "sleeping for a minute to give the deployments time to get back up and not crash the database"
+                sleep 60


Hmm, I think my take on this is that it is potentially creating a lot of work on the Kubernetes API server if we patch everything every minute?

Could we find a way to directly query if the deployments have their memory limits set correctly before trying to set them?

Looking into this, will report back if I find a good approach.

Beginning of something:

kubectl get deploy -n $ns -o json | jq '.items[] | { name: .metadata.name, resources: .spec.template.spec.containers[0].resources }'

produces a list of JSON objects each with a name and the resource spec. Next up, picking those that do not match the desired memory limit and request.

kubectl get deploy -n $ns -o json | jq '.items[] | [.metadata.name, .spec.template.spec.containers[0].resources.limits.memory, .spec.template.spec.containers[0].resources.requests.memory] | join(",")' --raw-output

This outputs a list of the format:

<deploymentname>,<memorylimit>,<memoryrequest>

To get a particular list of deployments, we can append this to the command (selecting nginx|redis|varnish|cli followed by memory indications):

| grep -E '^(nginx|redis|varnish|cli),([0-9]+[A-Za-z]+|),([0-9]+[A-Za-z]+|)$'

If we save the output of the above as variable spec get the individual fields like this:

deployment_name=`echo $spec | cut -f1 -d,` current_memory_limit=`echo $spec | cut -f2 -d,` current_memory_request=`echo $spec | cut -f3 -d,`

Now we can decide to only patch if the limit or request differ from the current state. Much fewer requests.

When everything is as it should be it means we get the following requests every execution (6 hours):

1 GET namespaces

1 GET deployments per namespace

(and then up to 4 patch calls per namespace, but for most sites 0)

Whereas before, when we naively patched, we would get:

1 GET namespaces

4 PATCH calls per namespace

ITViking · 2025-01-15T12:19:14Z

This has been tested and has been live since before christmas (december? can't remeber, but for a good while now)

hypesystem · 2025-01-15T12:29:48Z

@ITViking Then I think this is good to merge - we can look at potentially using my research at a later point if we think it is too resource intensive. The minute long break between each namespace is probably helping us though 😄

ITViking requested review from ath88 and hypesystem December 15, 2024 20:03

ITViking assigned ath88 Dec 15, 2024

ath88 reviewed Dec 15, 2024

View reviewed changes

.../environments/dplplat01/configuration/vertical-pod-autoscaler/patch-cms-bnf-deployments.yaml Outdated Show resolved Hide resolved

ITViking and others added 6 commits January 9, 2025 12:33

adjust resource request for pr ens

17734a7

This will run every 6 hours and ensure that as more pr-envs are being added, they are resource regulated a long the way. Right now they would not be regulated until next time the manuel script would be run

fix cluster role name

4cfe434

move the patch resources into chart folder

8aa2fc9

finalize helm charting the automatic resource request patching of pr …

93216f6

…envs

remove redundant goose eyes

38a3b74

reflect changed verbs list

f7deda6

ITViking force-pushed the automate-resource-request-setting-for-pr-envs branch from 074f4db to f7deda6 Compare January 9, 2025 11:33

leave a comment - we have more pressing matters to fix for now

4fdea7e

ITViking requested a review from ath88 January 13, 2025 14:35

hypesystem approved these changes Jan 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automatically adjust pr envs resource request #560

automatically adjust pr envs resource request #560

ITViking commented Dec 15, 2024

ath88 Dec 15, 2024

ITViking Dec 16, 2024

ITViking Jan 9, 2025

ath88 commented Dec 15, 2024

hypesystem left a comment

hypesystem Jan 15, 2025

hypesystem Jan 15, 2025

hypesystem Jan 15, 2025

hypesystem Jan 15, 2025

ITViking commented Jan 15, 2025

hypesystem commented Jan 15, 2025

automatically adjust pr envs resource request #560

Are you sure you want to change the base?

automatically adjust pr envs resource request #560

Conversation

ITViking commented Dec 15, 2024

What does this PR do?

Should this be tested by the reviewer and how?

Any specific requests for how the PR should be reviewed?

What are the relevant tickets?

ath88 Dec 15, 2024

Choose a reason for hiding this comment

ITViking Dec 16, 2024

Choose a reason for hiding this comment

ITViking Jan 9, 2025

Choose a reason for hiding this comment

ath88 commented Dec 15, 2024

hypesystem left a comment

Choose a reason for hiding this comment

hypesystem Jan 15, 2025

Choose a reason for hiding this comment

hypesystem Jan 15, 2025

Choose a reason for hiding this comment

hypesystem Jan 15, 2025

Choose a reason for hiding this comment

hypesystem Jan 15, 2025

Choose a reason for hiding this comment

ITViking commented Jan 15, 2025

hypesystem commented Jan 15, 2025