-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: de-dupe KubeletTooManyPods, add cluster to descriptions #1011
base: master
Are you sure you want to change the base?
Conversation
count by (%(clusterLabel)s, node) ( | ||
(kube_pod_status_phase{%(kubeStateMetricsSelector)s, phase="Running"} == 1) | ||
* on (%(clusterLabel)s, namespace, pod) group_left (node) | ||
group by (%(clusterLabel)s, namespace, pod, node) ( | ||
kube_pod_info{%(kubeStateMetricsSelector)s} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the "More robust de-dupe of pod count in KubeletTooManyPods alert".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw this change is covered by an existing test:
Lines 406 to 435 in 03c13f9
- interval: 1m | |
input_series: | |
- series: 'kube_node_status_capacity{resource="pods",instance="172.17.0.5:8443",cluster="kubernetes",node="minikube",job="kube-state-metrics",namespace="kube-system"}' | |
values: '3+0x15' | |
- series: 'kube_pod_info{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",node="minikube",pod="pod-1",service="kube-state-metrics"}' | |
values: '1+0x15' | |
- series: 'kube_pod_status_phase{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",phase="Running",pod="pod-1",service="kube-state-metrics"}' | |
values: '1+0x15' | |
- series: 'kube_pod_info{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",node="minikube",pod="pod-2",service="kube-state-metrics"}' | |
values: '1+0x15' | |
- series: 'kube_pod_status_phase{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",phase="Running",pod="pod-2",service="kube-state-metrics"}' | |
values: '1+0x15' | |
- series: 'kube_pod_info{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",node="minikube",pod="pod-3",service="kube-state-metrics"}' | |
values: '1+0x15' | |
- series: 'kube_pod_status_phase{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",phase="Running",pod="pod-3",service="kube-state-metrics"}' | |
values: '1+0x15' | |
alert_rule_test: | |
- eval_time: 10m | |
alertname: KubeletTooManyPods | |
- eval_time: 15m | |
alertname: KubeletTooManyPods | |
exp_alerts: | |
- exp_labels: | |
cluster: kubernetes | |
node: minikube | |
severity: info | |
exp_annotations: | |
summary: "Kubelet is running at capacity." | |
description: "Kubelet 'minikube' is running at 100% of its Pod capacity." | |
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods |
This PR has two sets of changes:
cluster
label to any alert description where it was missing (currently inconsistent)KubeletTooManyPods
alertFixes #997.