-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: clean up data organization/add basic fetching scripts #558
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea, and the diff looks good overall to me, I think it's fine to merge as is. A couple of thoughts:
I do see "#{Dir.home}/.local/share/wags_tails"
scattered a few places around the code (though mostly it's used via the default_data_dir
method you wrote).
Since it looks like this pr adds a dependency on wags_tails
anyways, I wonder if it's worth having the DGIdb importers "ask" wags-tails where its data root path is somehow? That's definitely not a huge/blocking issue since only a few people will even utilize this process but it does feel a little strange to have this just hardcoded to the default path for another tool and hope they haven't configured it to point to a non-default path.
re: calling python from within the DGIdb codebase, its not too bad, we have an example here calling civicpy and capturing stderr, stdout, and the return code, or the simpler but less robust directly shelling out here. The challenging part is if you want to use a venv or similar, making sure you're getting the correct versions of everything in this content, but its doable if we want to automate the process more going forward.
Anyways, those are just thoughts, not blockers. LGTM!
Yeah, we've sort of written the data management code to enable this kind of thing (although we'd envisioned it more in the context of shell scripting/workflow files). What you have in CIViC looks good so I went ahead and copied it over into the places where we need to look up the root data directory -- let me know if it looks ok. In the long term it would also be nice to just call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
updater
modules for this purpose but I'd prefer something that reuses said data management tool above, it's a little less maintenance for me.<source>_<the kind of stuff>_<version>.<filetype>
, since not every file includes its version as part of the data. So now, the file downloader is in charge of identifying the version of the data and naming the file as needed, and then DGI importers can just read version values from filenames.The workflow for updating would go like
rake db:reset
python3 scripts/download_files.py
Unfortunately this entails some awkwardness about running Python modules before the rake tasks. I thought a bit about doing something like shell commands from the ruby side to call the python or even looking at ruby-python FFIs but I don't think any particular solution will be terribly elegant so I'm not worrying about it for now.