We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We seem to be running into a similar problem in several projects, including http://github.com/subugoe/hoad/, http://github.com/subugoe/openairegraph/ and the crossref dump situation http://github.com/njahn82/cr_dump/:
There's big-ish (>1MB) serialised data, usually JSON, CSV or the same compressed, which is either/or
(I'm not talking about databases here, that's a separate concern).
These files cause several problems / face limitations:
git commit
Possible straightforward solutions might be:
I think we need something else which neatly abstracts away all this. There's probably a good solution out there already.
One avenue to pursue would be git lfs.
Ideally, we should have a solution which understands serialised data, and has a better understanding of diffing rows. (order does not matter).
Anyway, this shouldn't be too complicated and we might start with something small.
I'm going to look into this when I have the time. I think this could save us all a lot of time.
The text was updated successfully, but these errors were encountered:
among other things, the repeated downloads of the big dumps via download.file() should be transparently cached.
download.file()
Sorry, something went wrong.
enable ci and make stuff reproduce as per #2
21f6972
also opens #7 #6 #5 #4
this would also actually be a feature for a lot of users, who might face the same problem when they run this in CI or collaboratively.
maxheld83
No branches or pull requests
We seem to be running into a similar problem in several projects, including http://github.com/subugoe/hoad/, http://github.com/subugoe/openairegraph/ and the crossref dump situation http://github.com/njahn82/cr_dump/:
There's big-ish (>1MB) serialised data, usually JSON, CSV or the same compressed, which is either/or
(I'm not talking about databases here, that's a separate concern).
These files cause several problems / face limitations:
git commit
ed (too large)Possible straightforward solutions might be:
store only locally(no reproducibility)store on a network drive(no reproducibility)setting up a database(too expensive/too much hassle unless absolutely necessary)I think we need something else which neatly abstracts away all this.
There's probably a good solution out there already.
One avenue to pursue would be git lfs.
Ideally, we should have a solution which understands serialised data, and has a better understanding of diffing rows. (order does not matter).
Anyway, this shouldn't be too complicated and we might start with something small.
I'm going to look into this when I have the time.
I think this could save us all a lot of time.
The text was updated successfully, but these errors were encountered: