Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for data-pipeline us-central1 migration #1002

Merged
merged 22 commits into from
Aug 25, 2023

Conversation

stephen-soltesz
Copy link
Contributor

@stephen-soltesz stephen-soltesz commented Aug 24, 2023

This change updates the k8s configuration and documentation for the prometheus configuration used by the new data-pipeline cluster. This includes

  • Updating the version of kube-state-metrics to support the latest K8s APIs (some deprecated since 1.24)
  • Changing the initContainers to reference the root directory rather than a directory that does not exist for a new cluster
  • Updates to many dashboards referencing data-pipeline rather than data-processing
  • Adds three new data source definitions for the new cluster. The old data sources will remain for now, to be removed after migration to the new cluster is complete.
  • Alert updates to reference the new cluster name

I do expect that some alerts will fire once deployed since the history of metrics in staging will not begin until these changes are deployed.

Part of:

Before deployment, operator must manually add Cloud Build SA to the data-pipeline cluster rolebinding:

kubectl create clusterrolebinding additional-cluster-admins \
   --clusterrole=cluster-admin  --user=<projectnumber>@cloudbuild.gserviceaccount.com

After deployment, operator must manually adding a DNS record for the prometheus-data-pipeine.$PROJECT.measurementlab.net datasource using the allocated IP in the data-pipeline cluster.


This change is Reviewable

@stephen-soltesz stephen-soltesz changed the title Sandbox soltesz tf cluster 1 Prepare for data-pipeline us-central1 migration Aug 24, 2023
Copy link
Contributor Author

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified the following dashboards appear to work as intended in sandbox with the new data-pipeline cluster and changes in this PR:

I cannot test the following dashboards:

  • config/federation/grafana/dashboards/KeepMLabRunning.json -- pinned to oti
  • config/federation/grafana/dashboards/Archive_Repacker_Annotations.json -- not running

Reviewable status: 0 of 1 approvals obtained

Copy link
Contributor

@nkinkade nkinkade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephen-soltesz: Everything seems mostly okay. I added a couple very minor comments, and one possible blocker on the Stats Pipeline Monitoring dashboard.

Reviewed 28 of 29 files at r1, all commit messages.
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @stephen-soltesz)


config/federation/grafana/dashboards/Pipeline_AlternativeSLIs.json line 0 at r1 (raw file):
Just noting that it is a very mildly confusing that the file name of the dashboard is Pipeline_AlternativeSLIs, while the name of the dashboard in Grafana is Pipeline SLIs (minus "Alternative").


config/federation/grafana/dashboards/Stats_Pipeline_Monitoring.json line 1652 at r1 (raw file):

  "time": {
    "from": "2021-02-22T21:32:03.609Z",
    "to": "2021-02-23T00:26:27.766Z"

Is from:to supposed to be set like this by default. When I load the dashboard it shows no data, and the date range is configured for a range from last year. I suspect this is unintentional.


config/federation/grafana/provisioning/datasources/data-pipeline_mlab-sandbox.yml.template line 16 at r1 (raw file):

  orgId: 1
  # <string> url
  url: http://prometheus-data-pipeline.mlab-sandbox.measurementlab.net:9090

Just noting that I see this new domain in Cloud DNS in sandbox, but not in staging and mlab-oti. I imagine it will need to be manually created in staging and mlab-oti.


k8s/data-pipeline/mlab-oti.yml line 0 at r1 (raw file):
Just to be sure, these empty yaml files are just place holders for the case that we want to define per-project variables? I notice the data-processing ones are empty too, but the prometheus-federation ones are not.

Copy link
Contributor Author

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed your comments. Thank you for taking a look at the complex change!

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @nkinkade)


config/federation/grafana/dashboards/Pipeline_AlternativeSLIs.json line at r1 (raw file):

Previously, nkinkade wrote…

Just noting that it is a very mildly confusing that the file name of the dashboard is Pipeline_AlternativeSLIs, while the name of the dashboard in Grafana is Pipeline SLIs (minus "Alternative").

I agree. Renamed.


config/federation/grafana/dashboards/Stats_Pipeline_Monitoring.json line 1652 at r1 (raw file):

Previously, nkinkade wrote…

Is from:to supposed to be set like this by default. When I load the dashboard it shows no data, and the date range is configured for a range from last year. I suspect this is unintentional.

Fixed to last 24hr.


config/federation/grafana/provisioning/datasources/data-pipeline_mlab-sandbox.yml.template line 16 at r1 (raw file):

Previously, nkinkade wrote…

Just noting that I see this new domain in Cloud DNS in sandbox, but not in staging and mlab-oti. I imagine it will need to be manually created in staging and mlab-oti.

Yes, using an IP allocated by deploying this change. So, deploy this change, lookup the IP, create the DNS entry.


k8s/data-pipeline/mlab-oti.yml line at r1 (raw file):

Previously, nkinkade wrote…

Just to be sure, these empty yaml files are just place holders for the case that we want to define per-project variables? I notice the data-processing ones are empty too, but the prometheus-federation ones are not.

That must be so. I don't know why these ended up in the PR. The empty files are already present in the "data-processing" directory https://github.com/m-lab/prometheus-support/tree/main/k8s/data-processing

Git may treat renames of empty files differently than files with content, so it looks like a new file even though it should only be a rename.

Copy link
Contributor

@nkinkade nkinkade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained

@stephen-soltesz stephen-soltesz merged commit 7be5405 into main Aug 25, 2023
@stephen-soltesz stephen-soltesz deleted the sandbox-soltesz-tf-cluster-1 branch August 25, 2023 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants