Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restore_intelligent missing some files? #23

Open
ErinWeisbart opened this issue Apr 29, 2022 · 4 comments
Open

restore_intelligent missing some files? #23

ErinWeisbart opened this issue Apr 29, 2022 · 4 comments
Labels

Comments

@ErinWeisbart
Copy link
Member

ErinWeisbart commented Apr 29, 2022

I'm transferring from one bucket to another. After un-archive with restore_intelligent.py and transfer with aws s3 sync, using the S3 console to Calculate total size I get 17289 objects in the source bucket and 17287 objects in the destination bucket.
I confirmed the 17289 objects in the source bucket with aws s3 ls --human-readable --summarize.

If I run restore_intelligent on the source bucket it returns 17287 total files found pre-filtering (I didn't wait for it to run through the actual restoration again). So why isn't it finding all 17289 files that are actually there?

If I try aws s3 sync again it returns An error occurred (InvalidObjectState) when calling the UploadPartCopy operation: Operation is not valid for the source object's access tier for two files.

If I directly call those files one at a time with restore_intelligent it returns 1 total files found pre-filtering and REQUESTED 1 for each. If try again shortly thereafter it shows IN_PROGRESS 1. These do seem to be the missing files as they don't exist in the destination bucket, I can't download them from the source bucket because An error occurred (InvalidObjectState) when calling the GetObject operation: The operation is not valid for the object's access tier and I can download adjacent images from the source bucket.

So if these are actually the files it's missing, why does it find them if I call them directly?

(e.g. is Stain2_Batch2_Confocal)

@ErinWeisbart
Copy link
Member Author

ErinWeisbart commented Apr 29, 2022

I've replicated the behavior on an additional folder (e.g. Stain2_Batch2_MitoCompare, 1 file difference)

@ErinWeisbart
Copy link
Member Author

Similarly, I'm finding examples where restore_intelligent is listing a different number of files without a aws s3 sync throwing errors on anything.
e.g. (QCImages) where console and aws s3 ls shows 2939 in source, 2937 in destination after sync. restore_intelligent on source returns 2930 total files found pre-filtering with RESTORED 2930.

@bethac07
Copy link
Contributor

My strong suspicion is that sometimes folders "count", and sometimes they don't. It seems to be that an uploaded folder is an object, but then once moved is not, IME. That probably explains the cases with no errors, though I'm not sure there's any way to confirm other than just diffing the two.

Are there any patterns you've discerned so far about what the object names are?

@ErinWeisbart
Copy link
Member Author

I'm guessing you're right that the second set without errors does have something to do with folders being size 0 objects sometimes.
The first set is still odd. I haven't replicated it beyond those 2 folders with 2/1 errors, respectively. I'll keep an eye out for more, but we can probably close this in the meantime. Those two folders are alphabetically almost at the beginning of the giant list of folders that you recently un-archived so perhaps they were unarchived with those errors before you finished all the awesome improvements to restore_intelligent and slipped through the cracks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants