-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automatically adjust pr envs resource request #560
base: main
Are you sure you want to change the base?
Conversation
kubectl patch deployment $deploy -n $ns \ | ||
--type=json \ | ||
-p="[{\"op\": \"replace\", \"path\": \"/spec/template/spec/containers/0/resources\", \"value\": {\"requests\": {\"cpu\": \"15m\", \"memory\": \"$memory_request\"}, \"limits\": {\"cpu\": \"200m\", \"memory\": \"$memory_limit\"}}}]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the output here reveal if the deployment was changed or not? If so - you could skip the wait, if the resources had already been set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is a good point, will have to look into that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment, but there's more pressing matters to get at than this at the moment
I like your approach. Could you turn it into a helm chart instead, so the relationship between the resources is clearly marked for someone inspecting the cluster? |
.../environments/dplplat01/configuration/vertical-pod-autoscaler/patch-cms-bnf-deployments.yaml
Outdated
Show resolved
Hide resolved
This will run every 6 hours and ensure that as more pr-envs are being added, they are resource regulated a long the way. Right now they would not be regulated until next time the manuel script would be run
074f4db
to
f7deda6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very reasonable to me! Have you tested that it works in some limited capacity? Or are you awaiting approval before testing?
I have one concern which I will see if I can help with a solution for.
spec: | ||
containers: | ||
- name: patch-resources | ||
image: bitnami/kubectl:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pin to kubernetes version we know works with our setup.
kubectl may update or we may update cluster to create inconsistency unknowingly - so it's better to be able to control it.
fi | ||
done | ||
echo "sleeping for a minute to give the deployments time to get back up and not crash the database" | ||
sleep 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think my take on this is that it is potentially creating a lot of work on the Kubernetes API server if we patch everything every minute?
Could we find a way to directly query if the deployments have their memory limits set correctly before trying to set them?
Looking into this, will report back if I find a good approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beginning of something:
kubectl get deploy -n $ns -o json | jq '.items[] | { name: .metadata.name, resources: .spec.template.spec.containers[0].resources }'
produces a list of JSON objects each with a name and the resource spec. Next up, picking those that do not match the desired memory limit and request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubectl get deploy -n $ns -o json | jq '.items[] | [.metadata.name, .spec.template.spec.containers[0].resources.limits.memory, .spec.template.spec.containers[0].resources.requests.memory] | join(",")' --raw-output
This outputs a list of the format:
<deploymentname>,<memorylimit>,<memoryrequest>
To get a particular list of deployments, we can append this to the command (selecting nginx|redis|varnish|cli
followed by memory indications):
| grep -E '^(nginx|redis|varnish|cli),([0-9]+[A-Za-z]+|),([0-9]+[A-Za-z]+|)$'
If we save the output of the above as variable spec
get the individual fields like this:
deployment_name=`echo $spec | cut -f1 -d,`
current_memory_limit=`echo $spec | cut -f2 -d,`
current_memory_request=`echo $spec | cut -f3 -d,`
Now we can decide to only patch if the limit or request differ from the current state. Much fewer requests.
When everything is as it should be it means we get the following requests every execution (6 hours):
- 1 GET namespaces
- 1 GET deployments per namespace
- (and then up to 4 patch calls per namespace, but for most sites 0)
Whereas before, when we naively patched, we would get:
- 1 GET namespaces
- 4 PATCH calls per namespace
This has been tested and has been live since before christmas (december? can't remeber, but for a good while now) |
@ITViking Then I think this is good to merge - we can look at potentially using my research at a later point if we think it is too resource intensive. The minute long break between each namespace is probably helping us though 😄 |
What does this PR do?
This adds a cronjob, which will run every 6 hours. It will take approximately pr-envs*1'ish minute to complete.
The job will adjust the redis, varnish, nginx and cli deployments to create pods with a set resource request, fitting for pr environments.
This is in effect now
Should this be tested by the reviewer and how?
Any specific requests for how the PR should be reviewed?
What are the relevant tickets?
https://reload.atlassian.net/jira/software/c/projects/DDFDRIFT/boards/464?selectedIssue=DDFDRIFT-264