-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvcfs subrepos example #4740
dvcfs subrepos example #4740
Conversation
Documenting It should be on the user to pass a path to the subrepo that they want, eg: fs = DVCFileSystem(url="[email protected]:iterative/example-get-started.git", repo="/subrepo") |
@skshetry Am I misunderstanding or is that not implemented today? >>> from dvc.api import DVCFileSystem
>>> url = "https://github.com/iterative/monorepo-example.git"
>>> fs = DVCFileSystem(url, repo="/nlp", rev="develop")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dave/micromamba/envs/dvc/lib/python3.11/site-packages/fsspec/spec.py", line 79, in __call__
obj = super().__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dave/Code/dvc/dvc/fs/dvc.py", line 150, in __init__
self.hash_jobs = repo.fs.hash_jobs
^^^^^^^
AttributeError: 'str' object has no attribute 'fs' |
Sorry, I was proposing an API, that would open a filesystem to a subrepo directly. It's not implemented at the moment (and keyword argument could be different, repo is already taken). The I'd prefer it to be simpler — simpler in interface and simpler in implementation, and more transparent on what's happening. |
Is there something that doesn't work? I am missing what is the worry, there are cases where people explicitly wanted it |
subrepos are just tricky and messy, hence the hesitation. Maybe @pmrowla has some comments on this, since he was dealing with that recently. |
so even more in favor of documenting😄 we will get people trying and reporting issues we are not currently catching? |
It's not only about implementation. The feature as a whole, is questionable (i.e. autodiscovering subrepos, and providing an overlay of multiple subrepos and nested repos). Nobody asked for that. As far as |
This is what I would prefer, but when I looked into implementing it last month for the So whereas now the user can do >>> dvc.api.open("subrepo/path/to/file", repo="https://.../some-repo.git") A subrepo specific dvcfs instance needs something like >>> with DVCFileSystem(url="https://.../some-repo.git", repo="subrepo") as fs:
>>> fs.open("path/to/file") I think for a user it would be unintuitive and confusing as to why I can use the root git URL and paths relative to that git root in the top level
And just to be clear, I agree with this. But the problem is that we already support doing this for everything else in the DVC API and |
Thanks guys! So looks like overall we would much rather implement I understand the hesitation to document this as is, since it might become a problem in the current state. |
@dberenbaum Really sorry for blocking you here. Let's see if we can quickly make |
@efiop It's not urgent, but I didn't get the same idea as you from what @pmrowla wrote.
I interpreted this as saying that we should stick to the git root for consistency with existing api calls. |
@dberenbaum I think that's more about get_url's current conventions, but it doesn't mean that we can't have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, looks like no one was willing to jump and implement, so let's unblock and document this as is. If/when we get to that subrepo option - we can obsolete this.
Thank you and sorry for blocking.
I'm proposing that we document
DVCFileSystem(subrepos=True)
since we have had multiple requests about it: