Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatically adjust pr envs resource request #560

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

ITViking
Copy link
Contributor

What does this PR do?

This adds a cronjob, which will run every 6 hours. It will take approximately pr-envs*1'ish minute to complete.
The job will adjust the redis, varnish, nginx and cli deployments to create pods with a set resource request, fitting for pr environments.

This is in effect now

Should this be tested by the reviewer and how?

Any specific requests for how the PR should be reviewed?

What are the relevant tickets?

https://reload.atlassian.net/jira/software/c/projects/DDFDRIFT/boards/464?selectedIssue=DDFDRIFT-264

Comment on lines 46 to 48
kubectl patch deployment $deploy -n $ns \
--type=json \
-p="[{\"op\": \"replace\", \"path\": \"/spec/template/spec/containers/0/resources\", \"value\": {\"requests\": {\"cpu\": \"15m\", \"memory\": \"$memory_request\"}, \"limits\": {\"cpu\": \"200m\", \"memory\": \"$memory_limit\"}}}]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the output here reveal if the deployment was changed or not? If so - you could skip the wait, if the resources had already been set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a good point, will have to look into that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment, but there's more pressing matters to get at than this at the moment

@ath88
Copy link
Contributor

ath88 commented Dec 15, 2024

I like your approach. Could you turn it into a helm chart instead, so the relationship between the resources is clearly marked for someone inspecting the cluster?

ITViking and others added 6 commits January 9, 2025 12:33
This will run every 6 hours and ensure that as more pr-envs are being
added, they are resource regulated a long the way. Right now they would
not be regulated until next time the manuel script would be run
@ITViking ITViking force-pushed the automate-resource-request-setting-for-pr-envs branch from 074f4db to f7deda6 Compare January 9, 2025 11:33
@ITViking ITViking requested a review from ath88 January 13, 2025 14:35
Copy link
Contributor

@hypesystem hypesystem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very reasonable to me! Have you tested that it works in some limited capacity? Or are you awaiting approval before testing?

I have one concern which I will see if I can help with a solution for.

spec:
containers:
- name: patch-resources
image: bitnami/kubectl:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pin to kubernetes version we know works with our setup.

kubectl may update or we may update cluster to create inconsistency unknowingly - so it's better to be able to control it.

fi
done
echo "sleeping for a minute to give the deployments time to get back up and not crash the database"
sleep 60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think my take on this is that it is potentially creating a lot of work on the Kubernetes API server if we patch everything every minute?

Could we find a way to directly query if the deployments have their memory limits set correctly before trying to set them?

Looking into this, will report back if I find a good approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beginning of something:

kubectl get deploy -n $ns -o json | jq '.items[] | { name: .metadata.name, resources: .spec.template.spec.containers[0].resources }'

produces a list of JSON objects each with a name and the resource spec. Next up, picking those that do not match the desired memory limit and request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl get deploy -n $ns -o json | jq '.items[] | [.metadata.name, .spec.template.spec.containers[0].resources.limits.memory, .spec.template.spec.containers[0].resources.requests.memory] | join(",")' --raw-output

This outputs a list of the format:

<deploymentname>,<memorylimit>,<memoryrequest>

To get a particular list of deployments, we can append this to the command (selecting nginx|redis|varnish|cli followed by memory indications):

 | grep -E '^(nginx|redis|varnish|cli),([0-9]+[A-Za-z]+|),([0-9]+[A-Za-z]+|)$'

If we save the output of the above as variable spec get the individual fields like this:

deployment_name=`echo $spec | cut -f1 -d,`
current_memory_limit=`echo $spec | cut -f2 -d,`
current_memory_request=`echo $spec | cut -f3 -d,`

Now we can decide to only patch if the limit or request differ from the current state. Much fewer requests.

When everything is as it should be it means we get the following requests every execution (6 hours):

  • 1 GET namespaces
  • 1 GET deployments per namespace
  • (and then up to 4 patch calls per namespace, but for most sites 0)

Whereas before, when we naively patched, we would get:

  • 1 GET namespaces
  • 4 PATCH calls per namespace

@ITViking
Copy link
Contributor Author

This has been tested and has been live since before christmas (december? can't remeber, but for a good while now)

@hypesystem
Copy link
Contributor

@ITViking Then I think this is good to merge - we can look at potentially using my research at a later point if we think it is too resource intensive. The minute long break between each namespace is probably helping us though 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants