Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass list of sequencing projects and analysis projects to GoldStudyTranslator call in generate_biosample_set_from_gold_api_for_study #927

Conversation

sujaypatil96
Copy link
Collaborator

@sujaypatil96 sujaypatil96 commented Mar 11, 2025

The generate_biosample_set_from_gold_api_for_study job in data_records_stitching repository is responsible for creating a JSON file with biosamples of records from the GOLD study that are not already instantiated in the NMDC database. This takes as input the subsetted (do not exist in database) biosamples that need to be "processed" and pulls out certain attributes from it to populate slots on the Biosample class. Additionally, it also needs information about the associated sequencing projects and analysis projects from GOLD so it can comprehensively populate all the "mapped" slots on Biosample class.

Details

...

Related issue(s)

Fixes #886

Related subsystem(s)

  • Runtime API (except the Minter)
  • Minter
  • Dagster
  • Project documentation (in the docs directory)
  • Translators (metadata ingest pipelines)
  • MongoDB migrations
  • Other

Testing

  • I tested these changes (explain below)
  • I did not test these changes

I tested these changes by...

Documentation

  • I have not checked for relevant documentation yet (e.g. in the docs directory)
  • I have updated all relevant documentation so it will remain accurate
  • Other (explain below)

Maintainability

  • Every Python function I defined includes a docstring (test functions are exempt from this)
  • Every Python function parameter I introduced includes a type hint (e.g. study_id: str)
  • All "to do" or "fix me" Python comments I added begin with either # TODO or # FIXME
  • I used black to format all the Python files I created/modified
  • The PR title is in the imperative mood (e.g. "Do X") and not the declarative mood (e.g. "Does X" or "Did X")

@sujaypatil96 sujaypatil96 merged commit c84392d into main Mar 14, 2025
2 checks passed
@sujaypatil96 sujaypatil96 deleted the 886-gold-etl-missing-insdc_biosample_identifiers-values-in-biosample_set-records branch March 14, 2025 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GOLD ETL missing insdc_biosample_identifiers values in biosample_set records
2 participants