Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry transient GCS errors #581

Open
relud opened this issue Apr 4, 2023 · 3 comments
Open

Retry transient GCS errors #581

relud opened this issue Apr 4, 2023 · 3 comments

Comments

@relud
Copy link
Contributor

relud commented Apr 4, 2023

https://github.com/mozilla/bedrock/actions/runs/4598182706/jobs/8121878736
https://github.com/mozilla/glean/actions/runs/4609412250/jobs/8146505104?pr=2441

gsutil is failing to download objects that fail with 404 exceptions:

Error: Command ['gsutil', '-q', '-m', 'rsync', '-r', 'gs://probe-scraper-prod-artifacts/glean/', '/tmp/tmpy4y4leon/output/glean'] returned non-zero exit status 1:
NotFoundException: 404 gs://probe-scraper-prod-artifacts/glean/reference-browser/general does not exist.
NotFoundException: 404 gs://probe-scraper-prod-artifacts/glean/reference-browser/pings does not exist.
NotFoundException: 404 gs://probe-scraper-prod-artifacts/glean/reference-browser/tags does not exist.
CommandException: 3 files/objects could not be copied/removed.

the error is transient, because the objects do exist, but presumably are temporarily disappearing during upload or something like that. edit: but they have been updated since gsutil listed them, and gsutil requests the specific version at time of listing.

we could retry the full gsutil sync on failure, or we could reimplement the gsutil sync in python and retry 404s. the latter option is probably more robust, and should be relatively short.

@Dexterp37
Copy link
Contributor

@relud should we consider adding in some logging to help understanding the issue first, e.g. what's in GoogleCloudPlatform/gsutil#906 ?

@relud
Copy link
Contributor Author

relud commented Apr 11, 2023

we could add the -DD flag:

OPTIONS
  -D          Shows HTTP requests/headers and additional debug info needed
              when posting support requests, including exception stack traces.

              CAUTION: The output from using this flag includes authentication
              credentials. Before including this flag in your command, be sure
              you understand how the command's output is used, and, if
              necessary, remove or redact sensitive information.

  -DD         Same as -D, plus HTTP upstream payload.

but I wouldn't recommend it, as those headers will include auth tokens.

@relud
Copy link
Contributor Author

relud commented Apr 11, 2023

That said, I can confirm from running the command locally with -DD that gsutil does request a specific "generation" of objects, so if the file was rewritten between listing the object and downloading the content, I would expect it to 404.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants