-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken "Glacier_remote_uploads_duplicates" bug link #85
Comments
bug was removed from tracker by http://source.git-annex.branchable.com/?p=source.git;a=commit;h=b949e8504506dee6a2844bf61c0c9cf617fe9585
|
What follows here is attempt to reconstruct the content of that bug report at the time it was removed: Please describe the problem.Other references: #19 Background
The ProblemThis what I believe is happening in the two reports referenced above. When git-annex is used without When git-annex later does a "checkpresent" operation, glacier-cli fails. This is because the request is ambiguous, since there are two archives in Glacier with the same "key". The error message could be better here, but I believe that the behaviour is correct. Discussionglacier-cli can find out what data Glacier claims to have using an inventory retrieval. However, this retrieval takes about four hours and can be out of date (eg. if someone else recently deleted the archive from another client). Thus, I can understand git-annex's desire not to trust this data or a cache of it. However, whatever we do, it is impossible to map an "upload or overwrite on key X" type command to Glacier. We'll always end up with duplicates. Even if git-annex stored the Glacier archive IDs, there is no API to replace an existing archive with the same ID, and inventories are out of date even before we retrieve them. WorkaroundIf the problem is as I think it is, always applying To fix the problem after it has occurred, it should be sufficient to delete duplicates using glacier-cli, since they should be identical to each other. Some enhancement of the Update 10 June 2013: I've pushed a
CommentsComment 1 - by joey - 2013-05-23T15:55:16ZPlease beware of the warning on the man page when using --trust-glacier-inventory:
While I'm inclined to want git-annex to store the necessary mappings from keys to glacier IDs in the git-annex branch, which would allow uploads/downloads from multiple repositories to the same glacier repository, it will not help with this problem. The git-annex branch can be out of date too. It seems that what's needed is a separate form of the checkpresent hook, that's used when deciding whether to copy data to glacier. BTW, there does seem to be a workaround that avoids duplicate copies to glacier:
While normally copy checks the inventory to see if a key has been sent to glacier, and so will re-send, the Comment 2 - by joey - 2013-05-23T15:57:08ZI suppose another way to fix it along similar lines would be to make Comment 3 - by joey - 2013-05-23T15:59:37ZIt's also worth noting that the assistant always trusts the location log when deciding whether to send a key to a remote. So I think it will not trigger this bug. It seems only Comment 4 - by Justin - 2013-05-27T22:24:44Z
I'm happy to test a patch. I haven't successfully compiled git-annex on my Mac, which is the only computer I have for the next month or so, but it wasn't too hard to get it to work on my Linux box. Comment 5 - by joey - 2013-05-29T17:54:11ZI started to make a branch with the change I suggested, but then I had another idea. The checkpresent hook can return either True or, False, or fail with a message if it cannot successfully check the remote. Currently for glacier, when --trust-glacier is not set, it always returns False. Crucially, in the case when a file is in glacier, this is telling git-annex it's not there, so copy re-uploads it. What if it instead, when the glacier inventory is missing a file, it returns False. And when the glacier inventory has a file, unless --trust-glacier is set, it fails. The result would be:
This seems like it should do the right thing in all cases, but I have not tested it. I've pushed a Comment 6 - by Robie - 2013-06-10T17:24:34ZThis seems reasonable to me. One other possibility that you could end up with a duplicate: if I'm not sure what we can do about this. Perhaps we need to accept that duplicates will occur, and handle them more gracefully. Comment 7 - by joey - 2013-06-11T14:38:19ZOk, I've merged the glacier branch into master. I would still be happy to see some testing of this before my next release (in a week). I guess I'll close this bug report. There are certainly still problems that can happen if there are multiple repositories all writing to glacier independently. Seems to me that one good way to deal with this is to set up a single remote that is configured to be a gateway to glacier. Comment 8 - by Jimmy - 2013-11-18T00:00:32Z -- subject: For those on Mac OS XThe duplicates script fails because the BSD/MacOS version of uniq doesn't support the -D option. You can work around this by installing the GNU version using Homebrew ('brew install coreutils') and then replacing the 'uniq' in the script with 'guniq' (Homebrew prefixes the coreutils with "g" by default). I seem to still be running in to this bug using git annex version 4.20131106 and 'git annex copy --to glacier' without the '--not --in glacier' flags. It's not a problem to use the extra flags but I wasn't originally aware of this issue and the duplicates don't seem to always occur. I'll do some more testing and see whether I can reliably predict what will create duplicates and what won't. |
Thank you for the report. Maybe it would be easiest to "deep link" directly into the git history instead - for example to http://source.git-annex.branchable.com/?p=source.git;a=blob;f=doc/bugs/Glacier_remote_uploads_duplicates.mdwn;h=75014a5e049a397cef2dd8745e98d4094319e1f6;hb=a9c7260adb7f6336270fe1af90484abb3bbb3991 - plus all the comments? |
glacier-cli/glacier-list-duplicates.sh
Lines 11 to 13 in e9f346d
this link https://git-annex.branchable.com/bugs/Glacier_remote_uploads_duplicates/ does not work anymore (404 Not Found)
The text was updated successfully, but these errors were encountered: