Skip to content

You should be able to pull with -b/-r #328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
grimmySwe opened this issue Dec 18, 2017 · 19 comments
Open

You should be able to pull with -b/-r #328

grimmySwe opened this issue Dec 18, 2017 · 19 comments

Comments

@grimmySwe
Copy link
Collaborator

Discussed in #325:
In version 0.4.0 of git-subrepo there is no good way to handle -b/-r flags for the subrepo pull operation.

As there is only one reference in the .gitrepo file, we can't determine the actual tree created by the merge and that would cause either:
a. We are cherry-picking without leaving a trace
b. We pull in old changes again because we have no easy way to determine how far we have pulled.

One possible solution would be to introduce another reference in the .gitrepo file. So that when we do pull -b/-r we actually store that in a separate entry, allowing us to recreate the structure when we push.

@grimmySwe
Copy link
Collaborator Author

A potential problem with using -b/-r is that you want to fetch commits from the remote based on SHA and that is not possible. Main problem is if you don't push the subrepo, clone a new main repo and then try to push. First step would then be to fetch the referenced commit from the .gitrepo file.

@svelez
Copy link

svelez commented Dec 18, 2017

Thanks for establishing this issue.

I am a bit confused about the problem statement in #325 and the opening comment though. Is the lack of enough entries in .gitrepo assuming that the git subrepo pull command would not update the tracking branch or is this a general problem whether or not the tracking branch is updated? I thought I noticed the current behavior of pull was to update the tracking branch.

I am thinking that if the tracking branch is updated and future pulls are from that branch, then the current commit and parent entries should be completely adequate for those pulls? Though to be honest, I am not sure how merges in the superrepo are/should be handled.

I also think that if you supply -b again to pull/track the original branch once more the established superrepo history should allow a reasonable merge of that branch's head revision.

@grimmySwe
Copy link
Collaborator Author

@svelez Lets see if I can explain how I see this. Remember that this is my view of it, not the final truth.

In the standard use case (never using -b/-r) you will track one branch in the subrepo. .gitrepo will contain information on the remote and branch to track, it will also contain a commit reference (subrepo.commit) to the latest subrepo commit known in the main repo and finally a commit reference to the latest commit (subrepo.parent) in the main repo that was pushed to the subrepo.

When you pull in new subrepo changes, you update the subrepo.commit. This value is used to easily see if there are any new commits in the subrepo AND it's used to build the virtual structure when we merge in the changes. subrepo.commit will be converted into a parent reference in a commit allowing us to connect the history of our virtual subrepo commits and the actual subrepo commits.

If we add a -b option to pull, we need to decide what we want to do:

  1. Don't update subrepo.commit, as we don't actually pull things from the tracking branch. This will allow us to keep business as usual, but there will be no possibility to create a trace of this operation in the subrepo.
  2. Update subrepo.commit, this will actually let us create the correct merge operation in future pushes to subrepo BUT if we would perform a pull without -b we would need to retrace our steps and solve possible conflicts.
  3. There is a possible third option that updates both subrepo.commit and subrepo.branch, but then we switch tracking branch and if we push, the previous pull operation would be reverse of what you expect.

So my current idea, would be to add a another possible reference in the .gitrepo file. So that we can keep the current tracking information AND also note if we pull in changes from other branches.

Did it make anything clearer?

@grimmySwe
Copy link
Collaborator Author

I find some holes in my design. Most of them are related to a use case where someone uses -b/-r to pull in changes from another repo/branch, pushes these changes to the main repo but NOT to the subrepo. Another user clones the main repo, sees that there are differences to the subrepo and tries to push those changes.

In that case it will need to fetch the correct commits to build up a history so that we can apply the correct ancestry on commits. So if you used pull -r, to pull in changes from a different remote, git-subrepo would need to know about this other remote and that it should fetch relevant commits from it.

I wonder if you actually need to have more data to solve this:

[tracking]
    remote = <tracking remote>
    branch = <tracking branch>
    commit = <last known commit on tracking branch>
    parent = <last commit pushed to subrepo>

[merge]
    remote = <remote used with -r>
    branch = <branch used with -b>
    commit = <parent for this virtual git-subrepo commit>

On the other hand, these operations might be better solved in the actual suprepo. Then you get the full strength of git, not just a subset of operations that is run through git-subrepo.

What would you expect would happen with git subrepo push -b <branch> -r <remote> <subrepo> should the subrepo actually know that we pushed in data from current tracking branch and consider it a merge? Or should it be a force push that simply overwrite the state of with current state?

@grimmySwe
Copy link
Collaborator Author

@ingydotnet Do you have some time thinking about the use cases for pull/push -r/-b combinations? For me it becomes complex operations where you try to handle the subrepo branch structure within another repo.

Git-subrepo already applies one virtual dimension, working with a specific tracking branch/remote and one subrepo. Allowing other branches/remotes to be merged/pushed seems like asking for trouble in the subrepo repository.

The easiest case would be, what is subrepo pull -b <branch> expected to do? Is it simply cherry picking in the latest state from without leaving a trace, or is it actually merging in so that subrepo knows about the operation.

@ingydotnet
Copy link
Owner

@grimmySwe,

git subrepo -b <branch> -f is effectively a cherry-pick replacement. Without -f it requires some kind of merge logic. I think -b ... -f is cool because you can try different HEADs. Nothing sticks until you push the subrepo.

I just had an idea. Maybe it would be useful if a .gitrepo could contain one or more pairs of commit refs, pairing a local commit to a subrepo commit. I'll think on this and make a new issue for it.

@grimmySwe
Copy link
Collaborator Author

@ingydotnet,

But aren't switching HEADs the subrepo clone job? Where you actually apply a complete state? Pull seems to be more of a merge tool to connect branches.

command/option description create local commit with .gitrepo update
clone Clone state, can only be used if there is no previous directory yes
-f overwrite local state yes
pull pull new commits from tracking branch, requires HEAD in tracking history, merge local state yes
-f ignore previous HEAD requirement yes
-b/-r pull commits from other location yes
-b/-r -u switch tracking branch yes
push push unpushed changes to tracking branch, requires tracking HEAD in history yes
-f use force push yes
-b/-r push to other location no
-b/-r -u switch tracking branch yes

@casnacaj
Copy link

casnacaj commented Jan 3, 2018

I'm also missing '-f' option for the pull command. The missing HEAD problem may be also caused by the subrepo branch rebase (that may occur on WIP branch in my case). The alternative name for the 'pull [-b/-r] -f' command may be 'checkout'. The behaviour of checkout could be slightly different to pull - I mean 'git subrepo checkout' should ignore previous HEAD requirement but it should fail in case, that main project was changed since last pull/checkout (the changes must be merged by the user). 'git subrepo checkout -f' should also ignore changes in the history (changes will be thrown).

@grimmySwe
Copy link
Collaborator Author

Created a Wiki page where I try to list all different cases and how to handle them:
https://github.com/ingydotnet/git-subrepo/wiki/Command-summary-test-for-issue-328

There is now a branch based on release/0.4.0 called issue/328 where I have implemented support for -r/-b flags with pull and allowing push with -b/-r to create snap shots.

Added some tests and made sure all the old tests works as well. It probably needs some more polishing but it seems to do ok in my simple tests. Main feature is to allow pulling with -b/-r AND keeping the tracking valid. It will use a new merged property in .gitrepo file to track when there are things outside tracking branch introduced. That allows us to create correct merge nodes and don't mess up the subrepo.

So if anyone has some spare time, please feel free to test it out :-)

@casnacaj
Copy link

casnacaj commented Jan 16, 2018

Hi.Thanks a lot for your work!

I'm working on your branch. I have the only problem. The '-f' option allows to pull in case that HEAD changed. But it requires manual merge. See the command output:

git subrepo pull src/bsp -b issue/PLM-26 -u -c -f -s
The "git merge" command failed:

  Auto-merging board/stm32h743-nucleo/cube-common/cube-common.ioc
  CONFLICT (content): Merge conflict in board/stm32h743-nucleo/cube-common/cube-common.ioc
  Auto-merging board/stm32h743-nucleo/cube-common/Src/main.c
  CONFLICT (content): Merge conflict in board/stm32h743-nucleo/cube-common/Src/main.c
  Automatic merge failed; fix conflicts and then commit the result.

You will need to finish the pull by hand. A new working tree has been
created at .git/tmp/subrepo/src/bsp so that you can resolve the conflicts
shown in the output above.

This is the common conflict resolution workflow:

  1. cd .git/tmp/subrepo/src/bsp
  2. Resolve the conflicts (see "git status").
  3. "git add" the resolved files.
  4. git commit
  5. If there are more conflicts, restart at step 2.
  6. cd /c/git/m1-2
  7. git subrepo commit src/bsp
See "git help merge" for details.

Alternatively, you can abort the pull and reset back to where you started:

  1. git subrepo clean src/bsp

See "git help subrepo" for more help.

My use case is:

  1. pull main brach
  2. make chanes
  3. create issue branch
  4. push issue branch
  5. in the subrepo repository: rebase issue branch to the current main branch
  6. pull updated issue branch

So I don't need to merge anything, I just want to make a snapshot of the current issue branch state from the subrepo. The only way which (as far as I know) is git clone, but clone requires to type http again.

@grimmySwe
Copy link
Collaborator Author

@casnacaj If you want to take a snap shot, git subrepo clone --force is the way to go. Otherwise you will need to solve you conflicts by selecting yours/theirs in the merge.

@casnacaj
Copy link

@grimmySwe Yes, I know. The problem I have with git subrepo clone is that I have to type url again. I'm missing some command which will allow me to 'reclone' subrepo without the need to retype url.

@grimmySwe
Copy link
Collaborator Author

@casnacaj Can you create a new issue for this? I feel that it would be better to add something to clone then updating pull with another function. Maybe the --force flag could work without additional argument

@admorgan
Copy link
Collaborator

@ingydotnet I would like to consider this for 0.5.0 and leave 0.4.1 a bug fix release. Are you ok with that?

@yajo
Copy link

yajo commented Jan 5, 2022

I was considering using git-subrepo to maintain a monorepo that consolidates lots of subrepos into a main one. It seemed great, but one key feature I missed is the upstream contribution to subrepos.

From the monorepo I should be able to merge multiple fixes in subrepos, and I should be able to contribute back to them and merge them easily.

I like the approach from #328 (comment).

The desired workflow would be:

  1. Checkout subrepo on origin's master HEAD.
  2. Develop fix.
  3. Upstream fix to our own fork, and open PR upstream.
  4. Checkout subrepo where it was before.
  5. Merge that PR.

... all considering there can be multiple fixes merged in.

I'm having hard time to understand how to do that easily, for now I think I'll go with submodules or git-aggregator. I hope that can be added at some point! 😃

@admorgan
Copy link
Collaborator

admorgan commented Jan 5, 2022

@yajo I am confused by your comment. The one thing that subrepo is great is supporting modifications locally then submitting them upstream and consuming those changes.

  1. Develop fix in monorepo
  2. git subrepo push from monorepo
  • I often use github or gerrit and have to override the server or branch accordingly when I push
  • It will create a set of patches that describe modifications to the upstream
  1. Get upstreamed patches approved (or just updated based on workflow)
  2. git subrepo pull
  • Only needed once all changes in upstream will be included in the pull.

This could be optimized, but the core works for everyday work.

@yajo
Copy link

yajo commented Jan 6, 2022

I mean that: what happens when your monorepo includes patches from more than one PR? Can you upstream both separately easily? I don't find how... maybe one tutorial on the subject would help.

@admorgan
Copy link
Collaborator

admorgan commented Jan 6, 2022

I am still having trouble identifying your question. I would expect the monorepo to have patches from many pull requests.

Scenario 1:
A change that touch multiple upstream projects is applied to the monorepo, you would have to do a git subrepo push and a git subrepo pull for each of the effected upstream projects. The monorepo is in a consistent state the whole time, but the upstream projects may not be in sync.

Scenario 2:
A change that spans multiple upstream projects needs to be brought into the monorepo, you would do a git subrepo pull for all the effected upstream projects, these pulls could be combined into one patch using squash if that helps your history.

I can't think of any other scenarios that fit how I read your description. If I have missed it can you give me an explicit example?

@yajo
Copy link

yajo commented Jan 6, 2022

Let me be more specific.

A few days ago, I added a subrepo as you can see here:

> git subrepo status
1 subrepo:

Git subrepo 'subrepos/odoo':
  Subrepo Branch:  subrepo/subrepos/odoo
  Remote URL:      https://github.com/odoo/odoo.git
  Upstream Ref:    545713e65b0
  Tracking Branch: 15.0
  Pulled Commit:   03e82187908
  Pull Parent:     da1be735a4c

Now let's say I need to update that branch:

> git subrepo pull subrepos/odoo
Subrepo 'subrepos/odoo' pulled from 'https://github.com/odoo/odoo.git' (15.0).

> git subrepo status
1 subrepo:

Git subrepo 'subrepos/odoo':
  Subrepo Branch:  subrepo/subrepos/odoo
  Remote URL:      https://github.com/odoo/odoo.git
  Upstream Ref:    9fae154ed7a
  Tracking Branch: 15.0
  Pulled Commit:   7d8497b9530
  Pull Parent:     da1be735a4c

Nice! All works as expected.

Now I need to merge a couple of patches before they're approved upstream. I just picked two random PRs for that branch:

> git subrepo pull subrepos/odoo -b refs/pull/82307/head
Subrepo 'subrepos/odoo' pulled from 'https://github.com/odoo/odoo.git' (refs/pull/82307/head).

> git subrepo status
1 subrepo:

Git subrepo 'subrepos/odoo':
  Subrepo Branch:  subrepo/subrepos/odoo
  Remote URL:      https://github.com/odoo/odoo.git
  Upstream Ref:    99794e1b4bc
  Tracking Branch: refs/pull/82307/head
  Pulled Commit:   99794e1b4bc
  Pull Parent:     da1be735a4c

Oops... that tracking branch shouldn't have changed because I didn't use git subrepo pull --update. That's probably unintended behaviour as per #545. In any case, the PR got merged. Let's continue and merge the 2nd PR.

> git subrepo pull subrepos/odoo -b refs/pull/82325/head
git-subrepo: Local repository does not contain 99794e1b4bc04de147f9b05f576321e995af05a2. Try to 'git subrepo fetch subrepos/odoo' or add the '-F' flag to always fetch the latest content.

It seems that to merge this PR, git-subrepo expects it to contain the commit from the other PR. However, that doesn't make sense because by definition that commit will never exist until the other PR is merged, and I'm actually merging separate PRs. A git merge would have no problem because both PRs still share common ancestors, but git-subrepo doesn't allow it.

OK, let's just try updating from the main branch:

> git subrepo pull subrepos/odoo
Subrepo 'subrepos/odoo' is up to date.

Of course it's not true. It should update because there are still more commits in the 15.0 branch. But as it modified unexpectedly the tracking branch, it thinks it's up to date. Ok, let's workaround that by specifying the 15.0 branch again:

> git subrepo pull subrepos/odoo -b 15.0
git-subrepo: Local repository does not contain 99794e1b4bc04de147f9b05f576321e995af05a2. Try to 'git subrepo fetch subrepos/odoo' or add the '-F' flag to always fetch the latest content.

And yet again, git-subrepo expects that the main branch contains the PR commit, which is not going to happen until upstream decides to merge the PR. Maybe one day they will decide to merge it using a rebase and that commit would never exist in the 15.0 branch, so still that expectation is unreasonable.

Anyway, let's try to obey that last error message:

> git subrepo pull subrepos/odoo -b 15.0 -F

git-subrepo: Invalid option '--fetch' for 'pull'.

  Usage: git subrepo pull <subdir>|--all [-M|-R|-f] [-m <msg>] [-e] [-b <branch>] [-r <remote>] [-u]


  Update the subrepo subdir with the latest upstream changes.

  The `pull` command fetches the latest content from the remote branch pointed
  to by the subrepo's `.gitrepo` file, and then tries to merge the changes into
  the corresponding subdir. It does this by making a branch of the local
  commits to the subdir and then merging or rebasing (see below) it with the
  fetched upstream content. After the merge, the content of the new branch
  replaces your subdir, the `.gitrepo` file is updated and a single 'pull'
  commit is added to your mainline history.

  The `pull` command will attempt to do the following commands in one go:

    git subrepo fetch <subdir>
    git subrepo branch <subdir>
    git merge/rebase subrepo/<subdir>/fetch subrepo/<subdir>
    git subrepo commit <subdir>
    # Only needed for a consequential push:
    git update-ref refs/subrepo/<subdir>/pull subrepo/<subdir>

  In other words, you could do all the above commands yourself, for the same
  effect. If any of the commands fail, subrepo will stop and tell you to finish
  this by hand. Generally a failure would be in the merge or rebase part, where
  conflicts can happen. Since Git has lots of ways to resolve conflicts to your
  personal tastes, the subrepo command defers to letting you do this by hand.

  When pulling new data, the method selected in clone/init is used. This has
  no effect on the final result of the pull, since it becomes a single commit.
  But it does affect the resulting `subrepo/<subdir>` branch, which is often
  used for a subrepo `push` command. See 'push' below for more information.
  If you want to change the method you can use the `config` command for this.

  When you pull you can assume a fast-forward strategy (default) or you can
  specify a `--rebase`, `--merge` or `--force` strategy. The latter is the same
  as a `clone --force` operation, using the current remote and branch.

  Like the `clone` command, `pull` will squash all the changes (since the last
  pull or clone) into one commit. This keeps your mainline history nice and
  clean. You can easily see the subrepo's history with the `git log` command:

    git log refs/subrepo/<subdir>/fetch

  The set of commands used above are described in detail below.

  The `pull` command accepts the `--all`, `--branch=`, `--edit`, `--force`,
  `--message=`, `--remote=` and `--update` options.

> git subrepo fetch subrepos/odoo -b 15.0
Fetched 'subrepos/odoo' from 'https://github.com/odoo/odoo.git' (15.0).

> git subrepo pull subrepos/odoo -b 15.0
git-subrepo: Local repository does not contain 99794e1b4bc04de147f9b05f576321e995af05a2. Try to 'git subrepo fetch subrepos/odoo' or add the '-F' flag to always fetch the latest content.

So I'm out of ideas. How am I supposed to merge two PRs into my downstream monorepo?

And, supposing that I manage to do that, if I still develop a new fix on top of all those merges, how do I separate those commits and push them to another branch so I can publish it and open a third PR upstream?


I'm maybe misunderstanding the tool, or the expected workflow. I just tried to work similar to how I'd have done it with normal git.

Somehow I think this proves that git-subrepo makes wrong assumptions:

  1. We don't always track just one branch downstream. Instead, we merge various branches. And still on top of those, we can have local changes that belong to no upstream branch.
  2. All branches don't always share the topmost common ancestor commit. But as long as they have one common ancestor, git manages to merge them, so there should be no reason why git-subrepo couldn't do that also.
  3. Related with the previous point, branches can get force-pushed (specifically PR branches).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants