Adding Zenodo as a Remote #6015
Replies: 6 comments
-
Hi @4aHxKzD 🙂 Indeed, we just didn't receive any requests for it. I'm personally not familiar with zenodo as storage. If one would like to add support for it, it would require first implementing an fsspec-compatible https://github.com/intake/filesystem_spec/ filesystem in his own repository and dvc will then be able to use it as a plugin in the near future #5162. |
Beta Was this translation helpful? Give feedback.
-
I think that we could use plain HTTP remote, but the repo would need to have a different remote for each data file. I already tried using |
Beta Was this translation helpful? Give feedback.
-
@4aHxKzD Seems somewhat similar to ipfs #4736 , we had similar problems there. I'm not sure how to use such remotes right now, they are just too unconventional compared to the rest of remotes. I'm sure with some research we could find some acceptable way to handle that, but looks like there will be some tradeoffs or special mechanisms that will need to be introduced and I'm not sure what it will take to merge them into the upstream dvc. 🙁 |
Beta Was this translation helpful? Give feedback.
-
This seems related to #5450 - I'm not too familiar with Zenodo, but it looks more like it would be suited to keeping track of "snapshots" or archives of a given DVC repo state, rather than for versioning live data as a regular DVC remote? So if/when #5450 is implemented, users would just upload those snapshots to Zenodo as needed |
Beta Was this translation helpful? Give feedback.
-
@pmrowla It probably is more suited for the first case you mention (in fact I believe that is what people do with software releases here on github, each release gets published on Zenodo with a new DOI), but I'm more interested in the latter. I simply want to version my data there. |
Beta Was this translation helpful? Give feedback.
-
Zenodo is also fairly used to create reproducibility packages associated to academic publications (at least in astrophysics). I would be potentially interested in having a workflow to do version control (even just locally) the output of simulations (mostly in the form of large text files). This would be helpful in development while producing lots of output unlikely to make it in the final publication. |
Beta Was this translation helpful? Give feedback.
-
Is there a reason for not having Zenodo as a remote (for data) apart from nobody having dedicated time to develop the feature?
I know that that Zenodo creates a DOI for each version of a data set, which makes it harder do update the "repository" with
push
,but they provide a DOI that represents every version of a file and points to the latest. Maybe that could be used as a remote.
I searched older Issues, but it seems that nobody suggested this feature, so I don't know if it out of scope for this project or if there really isn't any interest from the community.
Thanks
Beta Was this translation helpful? Give feedback.
All reactions