Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatically adjust pr envs resource request #560

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
apiVersion: v2
name: patch-pr-env-resource-requests
version: 0.0.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
apiVersion: batch/v1
kind: CronJob
metadata:
name: patch-resource-requests
namespace: default
spec:
schedule: "0 */6 * * *" # Every 6 hours
jobTemplate:
spec:
template:
spec:
containers:
- name: patch-resources
image: bitnami/kubectl:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pin to kubernetes version we know works with our setup.

kubectl may update or we may update cluster to create inconsistency unknowingly - so it's better to be able to control it.

command:
- /bin/sh
- -c
- |
#!/bin/sh
for ns in $(kubectl get ns --no-headers -o custom-columns=":metadata.name" | grep -E "dpl-cms-pr|dpl-bnf-pr"); do
for deploy in nginx redis varnish cli; do
if kubectl get deployment $deploy -n $ns > /dev/null 2>&1; then
echo "Patching deployment $deploy in namespace $ns"
case $deploy in
nginx)
memory_request=160Mi
memory_limit=256Mi
;;
redis)
memory_request=60Mi
memory_limit=256Mi
;;
varnish)
memory_request=180Mi
memory_limit=256Mi
;;
cli)
memory_request=25Mi
memory_limit=125Mi
;;
*)
memory_request=150Mi
memory_limit=256Mi
;;
esac
# TODO: we could look into away of checking if the patching of a deployment went well, so we dont have to sleep
kubectl patch deployment $deploy -n $ns \
--type=json \
-p="[{\"op\": \"replace\", \"path\": \"/spec/template/spec/containers/0/resources\", \"value\": {\"requests\": {\"cpu\": \"15m\", \"memory\": \"$memory_request\"}, \"limits\": {\"cpu\": \"200m\", \"memory\": \"$memory_limit\"}}}]"
else
echo "Deployment $deploy not found in namespace $ns"
fi
done
echo "sleeping for a minute to give the deployments time to get back up and not crash the database"
sleep 60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think my take on this is that it is potentially creating a lot of work on the Kubernetes API server if we patch everything every minute?

Could we find a way to directly query if the deployments have their memory limits set correctly before trying to set them?

Looking into this, will report back if I find a good approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beginning of something:

kubectl get deploy -n $ns -o json | jq '.items[] | { name: .metadata.name, resources: .spec.template.spec.containers[0].resources }'

produces a list of JSON objects each with a name and the resource spec. Next up, picking those that do not match the desired memory limit and request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl get deploy -n $ns -o json | jq '.items[] | [.metadata.name, .spec.template.spec.containers[0].resources.limits.memory, .spec.template.spec.containers[0].resources.requests.memory] | join(",")' --raw-output

This outputs a list of the format:

<deploymentname>,<memorylimit>,<memoryrequest>

To get a particular list of deployments, we can append this to the command (selecting nginx|redis|varnish|cli followed by memory indications):

 | grep -E '^(nginx|redis|varnish|cli),([0-9]+[A-Za-z]+|),([0-9]+[A-Za-z]+|)$'

If we save the output of the above as variable spec get the individual fields like this:

deployment_name=`echo $spec | cut -f1 -d,`
current_memory_limit=`echo $spec | cut -f2 -d,`
current_memory_request=`echo $spec | cut -f3 -d,`

Now we can decide to only patch if the limit or request differ from the current state. Much fewer requests.

When everything is as it should be it means we get the following requests every execution (6 hours):

  • 1 GET namespaces
  • 1 GET deployments per namespace
  • (and then up to 4 patch calls per namespace, but for most sites 0)

Whereas before, when we naively patched, we would get:

  • 1 GET namespaces
  • 4 PATCH calls per namespace

done
restartPolicy: OnFailure
serviceAccountName: pr-env-patcher-cronjob-sa
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: pr-env-patcher-cronjob-sa
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: deployments-patch-list-get-patch-cluster-role
rules:
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["list"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: namespace-lister-binding
subjects:
- kind: ServiceAccount
name: pr-env-patcher-cronjob-sa
namespace: default
roleRef:
kind: ClusterRole
name: deployments-patch-list-get-patch-cluster-role
apiGroup: rbac.authorization.k8s.io
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/env bash
helm upgrade patch-pr-env-resource-requests chart \
--install
Loading