Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

endpoint for ingest of NMDC annotations into IMG #924

Closed
aclum opened this issue Mar 6, 2025 · 3 comments · Fixed by #931
Closed

endpoint for ingest of NMDC annotations into IMG #924

aclum opened this issue Mar 6, 2025 · 3 comments · Fixed by #931
Assignees
Labels
enhancement New feature or request

Comments

@aclum
Copy link
Contributor

aclum commented Mar 6, 2025

Is your feature request related to a problem? Please describe.

IMG needs an endpoint which provides the following information.

Use case 1

For a given MetagenomeAnnotation workflow_execution_id, return:

  • NMDC Study ID
  • NMDC Biosample ID
  • DataObjects that are has_output from MetagenomeAnnotation
  • MetagenomeAssembly ID for the DataObject that is has_input to MetagenomeAssembly
  • DataObjects for the MetagenomeAssembly ID that is has_input to MetagenomeAnnotation
  • ReadBasedTaxonomyAnalysis that uses the same has_input as the MetagenomeAssembly

Use case 2

For a given MetatranscriptomeAnnotation workflow_execution_id, return:

  • NMDC Study ID
  • NMDC Biosample ID
  • DataObjects that are has_output from MetatranscriptomeAnnotation
  • MetatranscriptomeAssembly ID for the DataObject that is has_input to MetatranscriptomeAssembly
  • DataObjects for the MetatranscriptomeAssembly ID that is has_input to MetatranscriptomeAnnotation

Abstracting to a generic use case

For a workflow_execution_set record, show information about input and output data objects and inputs and outputs of related (those which share and input or output file) workflow execution set records. Return the biosample ID and the study ID.

Screenshots
If applicable, add screenshots to help explain your problem.

Describe the solution you'd like
An endpoint which I can point IMG staff to

Additional context
This is a blocker for programmatic data transfer to IMG.

@eecavanna
Copy link
Collaborator

For reference, the nmdc-server code does a kind of traversal related to this. See: https://github.com/microbiomedata/nmdc-server/blob/784b51c7e8af2690cdec59e8121c1df127536601/nmdc_server/ingest/omics_processing.py#L57-L96

@eecavanna
Copy link
Collaborator

From @aclum verbally: "Input is a workflow annotation ID. We'll have to walk the graph in both directions."

@eecavanna
Copy link
Collaborator

Reassigning to @sujaypatil96 for now, based upon the plan several of us made via Zoom this afternoon (see Referential Integrity squad meeting notes Google Doc for details).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
3 participants