Migrate data-processing clusters to us-central1 #1092

stephen-soltesz · 2022-06-09T20:21:49Z

The data-processing cluster in mlab-sandbox & mlab-staging is in us-east, while the archive-measurement-lab bucket is in us-central1. These clusters should be redeployed to us-central, and their output buckets recreated in us-central. Since we want the GKE cluster to be managed by Terraform, we will recreate the production cluster as well.

create new data-processing cluster in us-central1 for sandbox & staging
create new etl-$PROJECT replacement bucket in us-central
create new etl-$PROJECT-us-central1 buckets & update etl & gardener configuration to use them

Production deployment

Tag terraform-support repo to create data-pipeline cluster in production

Import service account into TF by hand.

terraform import module.data-pipeline.google_service_account.stats_pipeline  \
    projects/mlab-oti/serviceAccounts/[email protected]

Add role binding to new GKE cluster:

kubectl create clusterrolebinding additional-cluster-admins  --clusterrole=cluster-admin  \
    --user=<id>@cloudbuild.gserviceaccount.com

Update CB substitutions for the six data pipeline service repos.
Tag all six data pipeline service repos to deploy to data-pipeline cluster
Create DNS record for prometheus-data-pipeline.mlab-oti.measurementlab.net using Cluster LB address

Clean up tasks after deployments:

Remove services from sandbox & staging data-processing cluster
Remove services from production data-processing cluster
Remove prometheus-data-processing.$PROJECT.* DNS records
Remove old data sources from prometheus-support & Grafana
Remove etl-$PROJECT intermediate buckets
Remove data-processing clusters

Consider

recreating etl-$PROJECT bucket in us-central & update etl parser to use the short name again
recreating the archive-$PROJECT buckets to be single-region (not multi-region) in us-central

The text was updated successfully, but these errors were encountered:

stephen-soltesz · 2022-08-05T15:42:21Z

Due to the v2 data pipeline cluster location in some projects, data must be transferred between regions in sandbox and staging project. This can be eliminated by placing these projects in us-central1 region.

mlab-oti     archive-measurement-lab us-central1 to data-processing	us-central1
mlab-staging archive-measurement-lab us-central1 to data-processing	us-east1
mlab-sandbox archive-measurement-lab us-central1 to data-processing	us-east1

etl-mlab-sandbox	Jun 13, 2017, 3:22:04 PM	Region	us-east1
etl-mlab-staging	Jul 31, 2020, 4:03:17 PM	Region	us-east1
etl-mlab-oti		Aug  6, 2020, 7:48:10 PM	Region	us-central1

Since this requires updates to sandbox and staging projects, the disruption will be minimal.

Changing the data-processing cluster locations will be easy. Changing the output target buckets may not be..

stephen-soltesz · 2022-08-11T22:03:10Z

The data-processing cluster includes multiple node pools for service-specific workloads:

default-pool Ok 1.21.12-gke.2200 1 (0 - 1 per zone) n1-standard-4
downloader-pool Ok 1.21.12-gke.2200 3 (1 per zone) n1-standard-2
parser-pool Ok 1.21.12-gke.2200 8 (2 - 3 per zone) n1-standard-16
prometheus-pool Ok 1.21.12-gke.2200 3 (1 per zone) n1-standard-4
stats-pipeline-pool Ok 1.21.12-gke.2200 3 (1 per zone) n2-standard-8

The commands used to create these node pools are various (and likely dated or incomplete):

data-processing cluster (default pool) - https://github.com/m-lab/etl-gardener/blob/main/create-pipeline-cluster.sh
parser-pool https://github.com/m-lab/etl-gardener/blob/main/create-parser-pool.sh
downloader - https://github.com/m-lab/downloader/blob/main/README.md#cluster-creation
prometheus - a variant of https://github.com/m-lab/prometheus-support/blob/main/README.md#within-an-existing-cluster
stats-pipeline-pool - appears to have been ad-hoc and undocumented.

stephen-soltesz · 2022-08-11T22:27:15Z

Repositories with services on the data-processing cluster (one per node pool):

etl
etl-gardener
prometheus-support
stats-pipeline
downloader
autoloader

stephen-soltesz · 2022-08-22T21:10:57Z

This should be completed using Terraform not manual, adhoc recreations.

stephen-soltesz · 2023-08-21T16:08:29Z

Evidently, while gcloud supports bulk-export for some resource types, GKE is not yet one of them.

Documentation on the Terraform gke module

https://registry.terraform.io/modules/terraform-google-modules/kubernetes-engine/google/latest

stephen-soltesz · 2023-08-21T17:20:59Z

GKE resource is called something else in this context, ContainerEngine, and ContainerNodePools

Running this command requires additional permissions than basic roles alone. https://cloud.google.com/asset-inventory/docs/access-control#required_permissions

gcloud beta resource-config bulk-export \
   --resource-types=ContainerCluster,ContainerNodePool \
   --project=mlab-sandbox --resource-format=terraform \
   --path=output

Additional types are ComputeNetwork and ComputeSubnetwork for declaring the VPC networks over which the cluster communicates.

gcloud beta resource-config list-resource-types
gcloud beta resource-config bulk-export  \
    --resource-types=ComputeNetwork,ComputeSubnetwork \
    --project=mlab-sandbox --resource-format=terraform --path=output

stephen-soltesz · 2023-08-22T16:36:56Z

Current data processing cluster workloads are using deprecated APIs.

stephen-soltesz · 2023-08-24T14:47:59Z

The deprecated APIs appear to be from kube-state-metrics (v2.2.4) from the prometheus-support configuration. Attempting to update to v2.9.2

stephen-soltesz · 2023-08-24T15:09:58Z

The archive-* buckets are "Multi-region" buckets:

archive-mlab-sandbox
archive-mlab-staging

Unclear if this has a significant impact on costs if it is not explicitly in the cluster region.

stephen-soltesz · 2023-08-24T15:10:26Z

Grafana must be restarted in each project to pickup the new datasources for the data-pipeline cluster.

stephen-soltesz · 2023-08-28T14:54:46Z

The egress traffic from measurement-lab to sandbox/staging appears to have decreased significantly over the weekend after stopping the data-processing cluster in the us-east last week.

stephen-soltesz · 2023-08-28T14:57:14Z

And the gardener & autoloader appear to be WAI in staging over the weekend also.

autolabel bot added the review/triage Team should review and assign priority label Jun 9, 2022

stephen-soltesz added fixit2022 and removed review/triage Team should review and assign priority labels Aug 5, 2022

stephen-soltesz mentioned this issue Aug 5, 2022

Add -input_location flag for gardener load bucket source m-lab/etl-gardener#408

Merged

stephen-soltesz added fixit2023 and removed fixit2022 labels Aug 21, 2023

stephen-soltesz self-assigned this Aug 21, 2023

This was referenced Aug 24, 2023

Add module for custom IAM roles m-lab/terraform-support#33

Open

Add data-pipeline configuration for production m-lab/terraform-support#34

Merged

stephen-soltesz mentioned this issue Sep 1, 2023

Remove data processing datasources m-lab/prometheus-support#1009

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate data-processing clusters to us-central1 #1092

Migrate data-processing clusters to us-central1 #1092

stephen-soltesz commented Jun 9, 2022 •

edited

Loading

stephen-soltesz commented Aug 5, 2022

stephen-soltesz commented Aug 11, 2022 •

edited

Loading

stephen-soltesz commented Aug 11, 2022 •

edited

Loading

stephen-soltesz commented Aug 22, 2022

stephen-soltesz commented Aug 21, 2023 •

edited

Loading

stephen-soltesz commented Aug 21, 2023 •

edited

Loading

stephen-soltesz commented Aug 22, 2023

stephen-soltesz commented Aug 24, 2023

stephen-soltesz commented Aug 24, 2023

stephen-soltesz commented Aug 24, 2023

stephen-soltesz commented Aug 28, 2023

stephen-soltesz commented Aug 28, 2023 •

edited

Loading

Migrate data-processing clusters to us-central1 #1092

Migrate data-processing clusters to us-central1 #1092

Comments

stephen-soltesz commented Jun 9, 2022 • edited Loading

stephen-soltesz commented Aug 5, 2022

stephen-soltesz commented Aug 11, 2022 • edited Loading

stephen-soltesz commented Aug 11, 2022 • edited Loading

stephen-soltesz commented Aug 22, 2022

stephen-soltesz commented Aug 21, 2023 • edited Loading

stephen-soltesz commented Aug 21, 2023 • edited Loading

stephen-soltesz commented Aug 22, 2023

stephen-soltesz commented Aug 24, 2023

stephen-soltesz commented Aug 24, 2023

stephen-soltesz commented Aug 24, 2023

stephen-soltesz commented Aug 28, 2023

stephen-soltesz commented Aug 28, 2023 • edited Loading

stephen-soltesz commented Jun 9, 2022 •

edited

Loading

stephen-soltesz commented Aug 11, 2022 •

edited

Loading

stephen-soltesz commented Aug 11, 2022 •

edited

Loading

stephen-soltesz commented Aug 21, 2023 •

edited

Loading

stephen-soltesz commented Aug 21, 2023 •

edited

Loading

stephen-soltesz commented Aug 28, 2023 •

edited

Loading