-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare for data-pipeline us-central1 migration #1002
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have verified the following dashboards appear to work as intended in sandbox with the new data-pipeline cluster and changes in this PR:
- config/federation/grafana/dashboards/Pipeline_Overview.json
- config/federation/grafana/dashboards/Pipeline_AlternativeSLIs.json
- config/federation/grafana/dashboards/Pipeline_Autoloader.json
- config/federation/grafana/dashboards/Stats_Pipeline_Monitoring.json
- config/federation/grafana/dashboards/Prometheus_SelfMonitoring.json
- config/federation/grafana/dashboards/Pipeline_Gardener.json
- config/federation/grafana/dashboards/Stats_Pipeline_AnnotationExportMonitoring.json
I cannot test the following dashboards:
- config/federation/grafana/dashboards/KeepMLabRunning.json -- pinned to oti
- config/federation/grafana/dashboards/Archive_Repacker_Annotations.json -- not running
Reviewable status: 0 of 1 approvals obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephen-soltesz: Everything seems mostly okay. I added a couple very minor comments, and one possible blocker on the Stats Pipeline Monitoring dashboard.
Reviewed 28 of 29 files at r1, all commit messages.
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @stephen-soltesz)
config/federation/grafana/dashboards/Pipeline_AlternativeSLIs.json
line 0 at r1 (raw file):
Just noting that it is a very mildly confusing that the file name of the dashboard is Pipeline_AlternativeSLIs
, while the name of the dashboard in Grafana is Pipeline SLIs
(minus "Alternative").
config/federation/grafana/dashboards/Stats_Pipeline_Monitoring.json
line 1652 at r1 (raw file):
"time": { "from": "2021-02-22T21:32:03.609Z", "to": "2021-02-23T00:26:27.766Z"
Is from
:to
supposed to be set like this by default. When I load the dashboard it shows no data, and the date range is configured for a range from last year. I suspect this is unintentional.
config/federation/grafana/provisioning/datasources/data-pipeline_mlab-sandbox.yml.template
line 16 at r1 (raw file):
orgId: 1 # <string> url url: http://prometheus-data-pipeline.mlab-sandbox.measurementlab.net:9090
Just noting that I see this new domain in Cloud DNS in sandbox, but not in staging and mlab-oti. I imagine it will need to be manually created in staging and mlab-oti.
k8s/data-pipeline/mlab-oti.yml
line 0 at r1 (raw file):
Just to be sure, these empty yaml files are just place holders for the case that we want to define per-project variables? I notice the data-processing ones are empty too, but the prometheus-federation ones are not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed your comments. Thank you for taking a look at the complex change!
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @nkinkade)
config/federation/grafana/dashboards/Pipeline_AlternativeSLIs.json
line at r1 (raw file):
Previously, nkinkade wrote…
Just noting that it is a very mildly confusing that the file name of the dashboard is
Pipeline_AlternativeSLIs
, while the name of the dashboard in Grafana isPipeline SLIs
(minus "Alternative").
I agree. Renamed.
config/federation/grafana/dashboards/Stats_Pipeline_Monitoring.json
line 1652 at r1 (raw file):
Previously, nkinkade wrote…
Is
from
:to
supposed to be set like this by default. When I load the dashboard it shows no data, and the date range is configured for a range from last year. I suspect this is unintentional.
Fixed to last 24hr.
config/federation/grafana/provisioning/datasources/data-pipeline_mlab-sandbox.yml.template
line 16 at r1 (raw file):
Previously, nkinkade wrote…
Just noting that I see this new domain in Cloud DNS in sandbox, but not in staging and mlab-oti. I imagine it will need to be manually created in staging and mlab-oti.
Yes, using an IP allocated by deploying this change. So, deploy this change, lookup the IP, create the DNS entry.
k8s/data-pipeline/mlab-oti.yml
line at r1 (raw file):
Previously, nkinkade wrote…
Just to be sure, these empty yaml files are just place holders for the case that we want to define per-project variables? I notice the data-processing ones are empty too, but the prometheus-federation ones are not.
That must be so. I don't know why these ended up in the PR. The empty files are already present in the "data-processing" directory https://github.com/m-lab/prometheus-support/tree/main/k8s/data-processing
Git may treat renames of empty files differently than files with content, so it looks like a new file even though it should only be a rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 1 of 1 approvals obtained
This change updates the k8s configuration and documentation for the prometheus configuration used by the new data-pipeline cluster. This includes
I do expect that some alerts will fire once deployed since the history of metrics in staging will not begin until these changes are deployed.
Part of:
Before deployment, operator must manually add Cloud Build SA to the data-pipeline cluster rolebinding:
After deployment, operator must manually adding a DNS record for the prometheus-data-pipeine.$PROJECT.measurementlab.net datasource using the allocated IP in the data-pipeline cluster.
This change is