Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--dry-run removes quotation marks from paths in fast-export.filtered #585

Open
hisaac opened this issue Jul 16, 2024 · 5 comments
Open

--dry-run removes quotation marks from paths in fast-export.filtered #585

hisaac opened this issue Jul 16, 2024 · 5 comments

Comments

@hisaac
Copy link

hisaac commented Jul 16, 2024

I'm using the --dry-run flag right now to get a sense of what changes git-filter-repo will make to my repository on certain commands. When run a git diff of the resulting fast-export.original and fast-export.filtered files, I noticed that quotation marks are being removed from paths that have spaces in them.

For example, running

git diff --no-index filter-repo/fast-export.original filter-repo/fast-export.filtered

results in the following output in the shell:

CleanShot 2024-07-16 at 15 18 01@2x

Is that expected behavior? Is there some way I can prevent these changes so that the diff is more useful?

@hisaac
Copy link
Author

hisaac commented Jul 16, 2024

Oh and in case it matters, this is the filter-repo command I'm running (actual paths modified):

git-filter-repo --dry-run \
	--invert-paths \
	--path Path/To/A/File/CodeToRemove.swift \
	--path Path/To/A/File/CodeToRemoveTests.swift \
	--path Path/To/Another/File/MoreCodeToRemove.swift

@newren
Copy link
Owner

newren commented Aug 2, 2024

Does applying this diff do what you want?

diff --git a/git-filter-repo b/git-filter-repo
index 9cce52a..6c02d31 100755
--- a/git-filter-repo
+++ b/git-filter-repo
@@ -192,10 +192,10 @@ class PathQuoting:
   @staticmethod
   def enquote(unquoted_string):
     # Option 1: Quoting when fast-export would:
-    #    pqsc = PathQuoting._special_chars
-    #    if any(pqsc[x] for x in set(unquoted_string)):
+    pqsc = PathQuoting._special_chars
+    if any(pqsc[x] for x in set(unquoted_string)):
     # Option 2, perf hack: do minimal amount of quoting required by fast-import
-    if unquoted_string.startswith(b'"') or b'\n' in unquoted_string:
+    #if unquoted_string.startswith(b'"') or b'\n' in unquoted_string:
       pqe = PathQuoting._escape
       return b'"' + b''.join(pqe[x] for x in unquoted_string) + b'"'
     return unquoted_string

@hisaac
Copy link
Author

hisaac commented Aug 4, 2024

Hmm, nope, no change to the diff when I use that patch.

Using this patch seems to fix it though:

diff --git a/git-filter-repo b/git-filter-repo
index 9cce52a..be3885c 100755
--- a/git-filter-repo
+++ b/git-filter-repo
@@ -195,7 +195,7 @@ class PathQuoting:
     #    pqsc = PathQuoting._special_chars
     #    if any(pqsc[x] for x in set(unquoted_string)):
     # Option 2, perf hack: do minimal amount of quoting required by fast-import
-    if unquoted_string.startswith(b'"') or b'\n' in unquoted_string:
+    if unquoted_string.startswith(b'"') or b'\n' in unquoted_string or b' ' in unquoted_string:
       pqe = PathQuoting._escape
       return b'"' + b''.join(pqe[x] for x in unquoted_string) + b'"'
     return unquoted_string

I don't know what the other implications of adding that would be though. What do you think?

@newren
Copy link
Owner

newren commented Aug 5, 2024

Oh, cool, glad you found the needed change. I'm reticent to apply it as-is, though, because it'll likely slow things down for the general case (this function is called quite a lot), and make the comment on the line above it a lie. A few possible paths forward:

  • just apply this patch locally
  • test the actual performance impact on notable rewrites a measure how much of a performance difference it makes
  • create an alternative quoting function and have it be used when --dry-run (and maybe --debug?) are selected. But, be careful about adding an extra if in every call of the function (if we do that, we'll have to check performance of that change); try to set it up so there's only one extra-if in the entire program

Thoughts?

@hisaac
Copy link
Author

hisaac commented Aug 13, 2024

The last option seems the best to me.

I unfortunately don't have the time to do the implementation myself. I'm happy to test things with my specific use-case though if you or someone else takes up the task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants