Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sequence run manager support for library linking #760

Open
reisingerf opened this issue Dec 9, 2024 · 6 comments
Open

feat: sequence run manager support for library linking #760

reisingerf opened this issue Dec 9, 2024 · 6 comments
Labels
epic pipeline Workflow/Pipeline Manager

Comments

@reisingerf
Copy link
Member

Ideally we'd be able to link sequence runs to libraries and vice versa.

This linking information would likely have to come from the SampleSheet of each sequence run. There is also other useful / important information in the SampleSheet that makes it worth retrieving and storing in the OrcaBus back-end.

(at a later stage we may even add information from RunInfo.xml and RunParameters.xml but that's another issue)

The ICAv2 BSSH sequencing event should carry a URL to retrieve details of the run and can hopefully be used to identify and retrieve the SampleSheet. The SS info can then be stored by the Sequence Run Manager and a link to the libraries can be established.

@alexiswl
Copy link
Member

alexiswl commented Dec 9, 2024

The bssh fastq copy does exactly this already.
We could add in the same logic into the BCLConvert Manager.

@reisingerf
Copy link
Member Author

Yes, I am aware... I was hoping we could steal some of the Stacky bits ;-)
Not sure I'd put all that into the BCL Convert Manager though...

The reasons for adding it to the Sequence Run Manager are:

  • the SS and linked libraries are initially a property related to the sequencing run
  • the SRM comes earlier in the chain (and data would be available earlier)
  • the SRM would store this even if no conversion would ever happen
  • we have already a SRM back-end with persistent storage and API
  • the BCL Convert Manager is more of an execution service, which usually don't have a persistent back-end (at least not an exposed one anyway)

An argument can be made that the SS is mainly used for conversion and could change even though the run does not.
However, the SS is still tightly bound to the run info and I keep the option of multiple SS per run on the SRM. The BCL manager could then reference the SS in the data payload of its WRSC...

Hope that makes sense.
Open for discussion though!

@alexiswl
Copy link
Member

alexiswl commented Dec 9, 2024

Yes makes sense, happy for this to put all the way into the Sequence Run Manager

@victorskl victorskl added epic pipeline Workflow/Pipeline Manager labels Dec 10, 2024
@reisingerf
Copy link
Member Author

@alexiswl @raylrui
Just following up from today's discussion....

I'd start this by extending the Sequence Manager to retrieve and store the SampleSheet.
I'd probably add a new table / model that holds the basic SS data + the whole SS as jsonb object / json blob for full reference and recreation purposes. There should be code examples for file retrieval, Json conversion, etc... and I am pretty sure Alexis would have an idea about what SS info to store directly in the model vs which is ok to pull out from the json.

This would also create "links" to library records. I'd hope that all library IDs from any SS are known to the Metadata Manager and the related OrcaBus IDs are retrievable, but I'd be happy to support that as optional (and send warnings accordingly).

There should be provisioning for more than one SS (one "current" or "valid" one, plus some historical ones). The "linked" libraries may need to be updated accordingly.

Once that's in we can repeat for the RunParameters and RunInfo files (if that's useful).

This would enable Oliver's lookup-by-run queries and would allow the UI to link from a sequencing run to library records and related workflows.

Happy for any and all ideas / suggestions!

@victorskl
Copy link
Member

@victorskl
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic pipeline Workflow/Pipeline Manager
Projects
None yet
Development

No branches or pull requests

3 participants