Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix merge of encrypted files with conflicts (#69) #76

Merged
merged 5 commits into from
Apr 27, 2020

Conversation

jmurty
Copy link
Collaborator

@jmurty jmurty commented Apr 12, 2020

Fix transcrypt's handling of merges where encrypted files have conflicting changes, a situation which would lead to Git producing "merged" files with conflict markers around partially- or fully-encrypted content that cannot be sensibly merged by a person. See issue #69 and a bunch of related issues.

The root problem is that git does not run the smudge/textconv filter on all BASE, LOCAL, REMOTE conflicting version files before attempting a three-way merge.

This change adds:

  • a merge driver script to pre-decrypt conflicting BASE, LOCAL, and REMOTE file versions then run git's internal merge-file command to merge the decrypted versions
  • git repo settings to configure the merge driver
  • recommendation to add the extra merge=crypt setting to .gitattribute definitions
  • tests of merge functionality to prove that non-conflicting and conflicting merges work.

Also included are minor listing and formatting fixes from applying the recommended tools to do this clean-up, and documentation for how to run these tools in README.md

The bulk of this work is originally from https://github.com/ixc/transcrypt/commits/fix-merge-with-conflicts

@jmurty jmurty requested a review from elasticdog April 12, 2020 06:23
@jmurty jmurty self-assigned this Apr 14, 2020
Copy link
Owner

@elasticdog elasticdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there might be a merge conflict at the moment, but otherwise this all looks conceptually great! Thanks for the attention to detail and the tests.

- the [shfmt](https://github.com/mvdan/sh) tool to apply consistent formatting
to the _transcrypt_ file, e.g: `shfmt -w transcrypt`
- the [Prettier](https://prettier.io) tool to apply consistent formatting to the
_README.md_ file, e.g: `prettier --write README.md`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for picking up on those and adding the details to the README!

README.md Outdated

Improvements:

- Add functional tests.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've previously just put changes information into the release notes, but perhaps we should create a legit CHANGELOG.md file instead of tacking them on here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Could that be a separate issue and PR as a follow up to these?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Yep, that's totally fine to pull these out in a separate PR.

fi
if [[ ! "$REMOTE_NAME" ]]; then
REMOTE_NAME="remote"
fi
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't pulled your changes yet to test things out locally, but would this be:

$ git rev-parse --abbrev-ref @{upstream}
upstream/master

??? There's also @{push}, but I think that's not what you're after here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that but it doesn't work for the tests which have no upstream, and isn't quite what is needed.

Ideally we would use something like this: REMOTE_NAME=$(git rev-parse --abbrev-ref MERGE_HEAD) where MERGE_HEAD names the branch being merged in. But that ref doesn't seem to exist when the merge driver script runs, and after another hunt I haven't been able to find any equivalent.

So far the awful $GIT_REFLOG_ACTION lookup hack is the only way I have found to be able to find the name of the branch being merged in. It isn't even that important compared to fixing conflict merges overall, it's just nicer to have the correct names in the file conflict markers.

transcrypt Outdated
# We must redirect stdout to $5 here instead of letting merge-file write to
# $2 as it would by default, because we need $5 to contain the final result
# content so a later crypt `clean` generates the correct hash salt value
git merge-file --stdout --marker-size=$4 -L $TARGET_NAME -L base -L $REMOTE_NAME $2 $1 $3 > $5
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant stuff...I've got some reading to do 😄 I really appreciate the detailed notes.

jmurty added 5 commits April 25, 2020 22:51
Two passing tests for merging encrypted files without conflicts,
and a single failing test for merging an encrypted where there are
conflicts.

This test is in anticipation of an incoming fix for merging
encrypted files with conflicts.
Fix transcrypt's handling of merges where encrypted
files have conflicting changes, a situation which
would lead to Git producing "merged" files with
conflict markers around partially- or fully-
encrypted content that cannot be sensibly merged
by a person.

The root problem is that git does not run the
smudge/textconv filter on all BASE, LOCAL, REMOTE
conflicting version files before attempting a
three-way merge.

This change adds:
- a merge driver script to pre-decrypt
  conflicting BASE, LOCAL, and REMOTE file
  versions then run git's internal `merge-file`
  command to merge the decrypted versions
- git repo settings to configure the merge driver
- recommendation to add the extra "merge=crypt"
  setting to .gitattribute definitions.
@jmurty jmurty force-pushed the 69-fix-merge-of-encrypted-files-with-conflicts branch from 9a31591 to 44f1c0e Compare April 25, 2020 13:31
@jmurty
Copy link
Collaborator Author

jmurty commented Apr 25, 2020

To further confirm the merge fix applied in this PR by adding a merge driver, the example problem test case outlined in #23 (comment) works well and produces human-readable plain text conflicts markers when updated to use the fixed transcrypt script and include merge=crypt in the line added to .gitattributes.

After a merge with conflicts of an encrypted file...

git diff with merge fix:

diff --cc sensitive_file
index d47baa8,0252192..0000000
--- a/sensitive_file
+++ b/sensitive_file
@@@ -1,2 -1,2 +1,6 @@@
  Secret!
++<<<<<<< HEAD
 +Another secret!
++=======
+ Another different secret!
++>>>>>>> theirs

cat sensitive_file with merge fix:

Secret!
<<<<<<< HEAD
Another secret!
=======
Another different secret!
>>>>>>> theirs

git diff without merge fix:

diff --cc sensitive_file
index ad4a320,514daaf..0000000
--- a/sensitive_file
+++ b/sensitive_file
@@@ -1,2 -1,2 +1,6 @@@
--Secret!
- Another secret!
 -Another different secret!
++<<<<<<< HEAD
++U2FsdGVkX18B9SN4DeiRUFIbTuYYLZZ7sX617msS6T5H36sM31sEqF9OHLvegVeD
++=======
++U2FsdGVkX18s9KknvdxLYAf/YVoPRNTzDLZ2JVrXLYrXSyFd9QWUSW+cTQw0cELA
++YVB0C7BKEpbn/zStQids1w==
++>>>>>>> 810aeb3... Changing downstream

cat sensitive_file without merge fix:

<<<<<<< HEAD
U2FsdGVkX18B9SN4DeiRUFIbTuYYLZZ7sX617msS6T5H36sM31sEqF9OHLvegVeD
=======
U2FsdGVkX18s9KknvdxLYAf/YVoPRNTzDLZ2JVrXLYrXSyFd9QWUSW+cTQw0cELA
YVB0C7BKEpbn/zStQids1w==
>>>>>>> 810aeb3... Changing downstream

Here is a version of @mdjnewman's test script updated to be runnable in the transcrypt repo directory and to use the merge driver. Include or exclude the merge=crypt portion written to .gitattributes to compare the fixed vs original merge behaviour.

mdjnewman-merge-test.sh.zip

All of this is to further confirm results of the "with conflicts" test case already in tests/test_merge.bats

@jmurty
Copy link
Collaborator Author

jmurty commented Apr 25, 2020

Hi @elasticdog I have updated the commits in this PR to rebase them on the latest master and clean them up significantly to hopefully make the changes easier to follow and review. And more suitable for direct merge to master when the time comes.

Copy link
Owner

@elasticdog elasticdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks perfect to me! I really appreciate how thorough you've been with checking the other example test case from #23. Is there any reason to hold off on merging?

@jmurty jmurty merged commit bf2b0e5 into master Apr 27, 2020
@jmurty
Copy link
Collaborator Author

jmurty commented Apr 27, 2020

Thanks for confirming @elasticdog, if you're happy I'm happy: PR is now merged.

@jmurty jmurty deleted the 69-fix-merge-of-encrypted-files-with-conflicts branch April 27, 2020 12:56
@Blackjacx
Copy link

Is this PR actually solving the issue in #23. I have the same problem and this would be a long awaited solution. If yes what do you thing when this gets released?

For all out there asking themselves how the heck to install the version from master unless this is released: Just follow the install instructions using git clone:

$ git clone https://github.com/elasticdog/transcrypt.git
$ cd transcrypt/
$ sudo ln -s ${PWD}/transcrypt /usr/local/bin/transcrypt

@elasticdog
Copy link
Owner

I'd be glad to cut a release in the near future if @jmurty doesn't have a backlog of other work that he'd like to have included?

@jmurty
Copy link
Collaborator Author

jmurty commented May 20, 2020

@Blackjacx I believe the merge driver does fix the merge issue, and it has done in my real world usage. Whether it fixes ALL merge issues in all cases is hard to be sure.

If you have a merge problem you could try it to see. In the problem repo, uninstall then reinstall using the git cloned transcrypt to confirm.

@elasticdog I think we can cut a new release soon. Pending work is to merge the Windows/path fix (probably as-is) and I think update the manual?

@jmurty
Copy link
Collaborator Author

jmurty commented May 20, 2020

Oh, and set merge=crypt in .gitattributes as well to test

@Blackjacx
Copy link

Blackjacx commented May 21, 2020

@jmurty really brilliant stuff you implemented here 👍

I will test it with the problem repo, but where can I find it?

@jmurty
Copy link
Collaborator Author

jmurty commented May 21, 2020

Thanks @Blackjacx, hopefully the merge driver will completely fix the merge problems. It's the result of a lot of trial and error and testing on some real-world merge problems, but I found and fixed a problem with it just a few days ago.

I thought you already had a repo showing the merge problem to test with. If not, it isn't worthwhile for you to test the example from #23 because I have already done so.

If you want to simulate at least simple test cases, you can see the test scripts for clean and conflicting merges in https://github.com/elasticdog/transcrypt/blob/master/tests/test_merge.bats

To install the latest merge driver feature, you need to:

  • get the latest transcrypt script from this repo (you have already done this, I think?)
  • uninstall and re-init using the latest transcrypt: transcrypt --display, copy the init command, transcrypt --uninstall, paste and run the init command.
    Init-ing using the latest version will put the merge driver script in the right place.
  • add merge=crypt at the end of each line for your transcrypt'ed files in .gitattributes to make the whole line look like sensitive_file filter=crypt diff=crypt merge=crypt

And that should be it. If you are able to experiment with merges that would have caused problems before I would appreciate the extra manual testing.

@Blackjacx
Copy link

Blackjacx commented May 21, 2020

Maybe I found another small bug in the implementation. But I don't understand the merge driver story so much apparently. In my case I try to rebase Feature-Brach on develop and get the following message (the conflict file is empty then) - any idea:

cat: "<FILE_WITH_CONFLICT>": No such file or directory
Auto-merging <FILE_WITH_CONFLICT>
error: add_cacheinfo failed to refresh for path '<FILE_WITH_CONFLICT>'; merge aborting.
hint: Could not execute the todo command
hint:
hint:     pick d0c3787e3e9bf9f956b263b1c7b4a6966ec7dc78 Test rebase with transcrypt
hint:
hint: It has been rescheduled; To edit the command before continuing, please
hint: edit the todo list first:
hint:
hint:     git rebase --edit-todo
hint:     git rebase --continue
Could not apply d0c3787e3... Test rebase with transcrypt

Maybe

if [[ "$?" != "0" ]]; then
  exit 1
fi

didn't catch the failed merge?

EDIT

Okay I found that the above problem occurs if the file in question has been modified on Feature-Brach and on develop but rebasing on the new commit on develop commit would actually succeed. You see that I get the error error: add_cacheinfo failed to refresh for path '<FILE_WITH_CONFLICT>'; merge aborting. you describe in the comments.

EDIT 2

For me it workes when the last line in the driver script becomes (double quotes stripped from $5):
echo "$(cat $5 | ./.git/crypt/clean $5)" > "$2"


Wow this stuff is really interesting :D

jmurty added a commit that referenced this pull request May 22, 2020
Fix incorrect quoting of the `$5` file path argument. This caused the
filename passed to transcrypt's `clean` to differ from the real
filename, producing an incorrect hash value and encrypted blob. Git
caught and flagged the error with:

    error: add_cacheinfo failed to refresh for path
    '<FILE_WITH_CONFLICT>'; merge aborting.

The clean merge situation that involves the merge driver script is
now properly exercised by an additional unit test.

Thanks @Blackjacx for catching this:
#76 (comment)
@jmurty
Copy link
Collaborator Author

jmurty commented May 22, 2020

Great catch thanks @Blackjacx!

To be honest, the merge driver stuff is completely bamboozling and I don't fully grasp it myself.
I introduced this quoting bug in the recent change, and apparently the test case that was supposed to exercise the merge driver for clean merges wasn't actually doing that.

From your description I was able to reproduce the problem in a test, and apply a fix that keeps the quoting (which I still want to keep due to #78) but does it properly. See #87

Could you please fetch the fixed version from master, uninstall and re-init, and test again?

@jmurty
Copy link
Collaborator Author

jmurty commented May 22, 2020

Here's the change to fix quoting, in case anyone spots other problems with it?

- echo "$(cat \"$5\" | ./.git/crypt/clean \"$5\")" > "$2"
+ cat "$5" | ./.git/crypt/clean "$5" > "$2"

@Blackjacx
Copy link

Yeah your're welcome and I can confirm it works now 👍

This change literally saves my a**. Since I really really need the ability to merge encrypted files. I tried with compile time code generation from encrypted files but no tool was able to deal with the data salad during a megre/rebase conflict. Nice work and thx for the collaboration 👍

@jmurty
Copy link
Collaborator Author

jmurty commented May 25, 2020

@Blackjacx Great! Thanks for confirming, and I'm glad it helps.

The merge driver will be included in an official release at some time but until then you can use the transcrypt direct from the master branch or your clone.

@stherold stherold mentioned this pull request Jun 2, 2020
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants