-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Prometheus metrics exporter for ingress-nginx controller. #297
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, rul det gerne ud i platformen og let's get it merged!
Du må gerne køre denne ud at your convenience - men sig lige til på Zulip så jeg kan reenable alerts til Zulip :-) |
This PR was in a bit of an unclear state because we talked about it on Zulip. Basically we'd like to scale up prometheus and loki to be confident in their ability to handle more data ingress before we execute and run this. So it's still waiting a bit until we get over the hump of getting all the libraries live. |
are we in a good enough place to get this in now? @hypesystem and @achton |
My impression is "yes", but I think you guys are better suited to answer that question. |
@hypesystem So what needs doing is to give loki and prometheus more resources or replicas or both - have I understood this correctly? |
@ITViking It kind of depends on how the resource use and storage of Loki is looking at the moment. We were waiting to monitor with all sites live to have an idea. Look at the resource use over time for the loki and prometheus pods to see how well-provisioned they are. If they don't spike too much over their resource requests we should be fine. Next step is looking at their storage, to make sure we are likely to be able to handle a significant amount of more data. If we also look far from our limits there, it should be fine to merge this. Alternatively, we need bigger disks for Loki and/or Prometheus which is a bit more difficult without throwing away data. A final question is if there's a good driver for adding this extra data export/whether it will be worth the potential extra cost of moving data around. |
What does this PR do?
To better track how the ingress-nginx controller is doing, we should export metrics from it to Prometheus/Grafana.
Any specific requests for how the PR should be reviewed?
This is not yet rolled out to the cluster, so eyeball it (and the docs) and do a Task-based rollout while monitoring the effects.
There are some dashboards available which could be imported for easy checking of metrics.
Available metrics are listed here.