Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to remove duplicate commits? #535

Open
DoubleCouponDay opened this issue Jan 9, 2024 · 1 comment
Open

Is there a way to remove duplicate commits? #535

DoubleCouponDay opened this issue Jan 9, 2024 · 1 comment

Comments

@DoubleCouponDay
Copy link

DoubleCouponDay commented Jan 9, 2024

I used the BFG Repo cleaner to remove large files but forgot to clone fresh copies after pushing. Now my main trunk is full of duplicate commits. Is there a git-filter-repo command that can remove them?

@newren
Copy link
Owner

newren commented Aug 2, 2024

So, I'm guessing that you did a git pull, which merged the two different versions of history.

You'll want to find three different commits using git log:

  • The merge commit that combined all the old history with the rewritten history: we'll call this ${MERGE_COMMIT}
  • The first commit after ${MERGE_COMMIT}: we'll call this ${FIRST_NEW_COMMIT}
  • The final commit of the BFG rewritten history (this should be one of the parents of ${MERGE_COMMIT): we'll call this ${FINAL_COMMIT_OF_BFG_REWRITTEN_HISTORY}
    Each of these three should be sha1sums corresponding to the relevant commit. With these...

Solution 1

If all the commits in your history since that merge are not merge commits, then you could try rebasing your commits on top of the good history. Something like:

    git rebase --onto ${FINAL_COMMIT_OF_BFG_REWRITTEN_HISTORY} ${MERGE_COMMIT}..HEAD

If you have any merge commits in your history since ${MERGE_COMMIT}, though, this would just mess things up.

Solution 2

Create a replace object that is a new commit like ${FIRST_NEW_COMMIT} but which has ${FINAL_COMMIT_OF_BFG_REWRITTEN_HISTORY} as its parent instead of having ${MERGE_COMMIT} as its parent. Then use filter-repo to rewrite the history:

   git replace --graft  ${FIRST_NEW_COMMIT} ${FINAL_COMMIT_OF_BFG_REWRITTEN_HISTORY}
   git filter-repo --proceed

A word of caution: if you have multiple commits that have ${MERGE_COMMIT} as a parent, you'll need to create new graft commits for all of them. N such commits, means you'll need to run git replace --graft ... N times. You only need to run git filter-repo --proceed once, but it needs to be after all N git replace --graft ... calls.

Solution 3

This one I can't give you any pseudo-code for. If you can do a filtering operation that will again modify the old commits to match the new commits, but which simultaneously is a no-op on the new commits, and which will remove the now-degenerate merge commit, that would also solve this problem. I don't remember details in terms of what additional modifications BFG makes (like [formerly OLDHASH] and Former-commit-id: and `.REMOVED.git-id) and whether it has added more or changed them, and you really need to be careful to filter in precisely the same way it did or you'd end up with even more variants. While this could theoretically be done with git-filter-repo, since it has to be filtered in precisely the same way it'd probably be easier to do by running bfg again. Even then, I'm not sure if running bfg again would really satisfy the constraints of being exactly the same for old commits while being a no-op for the previously-filtered or new commits. But, if you can nail it exactly, then this method would remove your duplicate commits by mapping multiple commits to one. Hopefully, mapping multiple to one wouldn't trigger any weird bugs in BFG.

Summary

Anyway, between the three, I suspect solution #2 is the most robust and easiest. Does that help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants