Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GOLD ETL missing insdc_biosample_identifiers values in biosample_set records #886

Closed
aclum opened this issue Jan 30, 2025 · 7 comments · Fixed by #927
Closed

GOLD ETL missing insdc_biosample_identifiers values in biosample_set records #886

aclum opened this issue Jan 30, 2025 · 7 comments · Fixed by #927
Assignees
Labels
bug Something isn't working

Comments

@aclum
Copy link
Contributor

aclum commented Jan 30, 2025

Describe the bug
GOLD ETL code is not populating insdc_biosample_identifiers for EMP 500

To Reproduce
Steps to reproduce the behavior:

  1. check json files attached to Ingest remaining 737 biosamples for nmdc:sty-11-547rwq94 (EMP500) issues#940 for insdc_biosample_identifiers, cross reference with GOLD.

Expected behavior
Ex nmdc:bsm-11-48fce216 should have a insdc_biosample_identifiers value of SAMEA7723945

Acceptance Criteria

  • json code generated from the gold translator for nmdc:sty-11-547rwq94 generates biosample_set records which have fields for insdc_biosample_identifiers
  • if possible determine when the bug was introduced & make tickets to update records if other recently imported studies are impacted.

Additional context
Add any other context about the problem here.

@ssarrafan
Copy link
Contributor

No updates for 2 weeks. Removing from sprint, adding backlog label. @aclum @sujaypatil96

@ssarrafan ssarrafan added the backlog Not assigned to a sprint or not completed during a planned sprint. Needs to be reprioritized. label Feb 10, 2025
@aclum
Copy link
Contributor Author

aclum commented Mar 11, 2025

@sujaypatil96 was this resolved?

@aclum aclum removed the backlog Not assigned to a sprint or not completed during a planned sprint. Needs to be reprioritized. label Mar 11, 2025
@sujaypatil96
Copy link
Collaborator

@aclum i don't think it was. I'll look into this today and post updates on this issue.

@sujaypatil96
Copy link
Collaborator

The JSON file for the remaining 279 biosamples includes the insdc_biosample_identifiers and ahs been fixed generically in the associated PR: EMP500_batch_two_ingest.json

@sujaypatil96
Copy link
Collaborator

However, we still need to go back and make a changesheet to add insdc_biosample_identifiers for the 458 samples that we added in the previous ingest.

@sujaypatil96 sujaypatil96 moved this from In Progress to In Review in 2025 - Sprint 58 - Mar 10- Mar 21, 2025 Mar 11, 2025
@sujaypatil96
Copy link
Collaborator

sujaypatil96 commented Mar 12, 2025

You can find the changesheet which updates the insdc_biosample_identifiers and also img_identifiers for some biosamples for which the value is present, on the squad slack channel.

@sujaypatil96
Copy link
Collaborator

The same changesheet has been applied to prod Mongo as well. We should see the results on the prod data portal after the next weekly prod portal ingest.

@github-project-automation github-project-automation bot moved this from In Progress to Done in EMP 500 Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
3 participants