Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importer: Automate GDC data dictionary download while running importer #133

Open
joeflack4 opened this issue Dec 1, 2021 · 0 comments
Open
Assignees
Labels
CI / automation Continuous integration and other automation ease:high urgency:low

Comments

@joeflack4
Copy link
Collaborator

joeflack4 commented Dec 1, 2021

Description

From the data/data_dictionary/gdc/README.md:

GDC Data Dictionary in JSON

The json files are downloaded from the backend of GDC data dictionary viewer. The files
are timestamped by the date that it was downloaded.

The URL for the file is https://api.gdc.cancer.gov/v0/submission/_dictionary/_all

The current.json file is a symlink to the most current version.

The command to download a current version and update the symlinked current.json is:

# run it in the project root path
python -m ccdh.importers.gdc

We might as well run this while doing normal importation. If there's any issue related to time taken to download or API frequency constraints, we can program that in and refer to a local cache if we need to. Can also add a try/except for good measure.

@joeflack4 joeflack4 self-assigned this Dec 1, 2021
@joeflack4 joeflack4 added ease:high urgency:low CI / automation Continuous integration and other automation labels Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI / automation Continuous integration and other automation ease:high urgency:low
Projects
None yet
Development

No branches or pull requests

1 participant