-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dorado] New Dorado Basecalling Workflow Terra #659
base: main
Are you sure you want to change the base?
Conversation
Good catch on some of the updates you recently made to the no trim option & also custom primers @fraser-combe. Structurally, I think this is nearly ready. @kapsakcj do you have bandwidth to do some additional testing of these new features? |
@AndrewLangvt @fraser-combe Let me check with team leads first, but given the green light I should have time later this week or early next week to review Fraser's tests and do some of my own testing |
Hey Fraser! I haven't had any chance to look at this PR yet, so here are some of my thoughts. Some of these things do not need to be implemented, but could be "nice-to-have" and so I've indicated them as such and removed the check box since they're mainly ideas for the future
I have a final comment that would result in an efficiency improvement, but it's a bigger lift and functionally would behave exactly the same as what you're already doing so I don't want you to implement it. But I thought I would mention it here for the sake of our posterity. Instead of doing a transfer_files -> create_terra-table, I think we could save a fair bit of runtime by sending the array of fastq files directly into a new task that will make a terra table using that array and then upload that table. It would act like This looks great!! |
Ready for review, Sage comments updated throughout - thanks some ideas I had not thought of Documentation updated to reflect new inputs and new trim task Updated Dorado to v 0.9.0 and updated all docker images to current version in basecall, demux and trim steps tested here (currently still running) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few follow ups that are outstanding from Sage's initial comments.
@kapsakcj are you able to take another look at this, following your previous comments & to ensure it will meet the needs of CDPH? |
This PR closes #713
🗑️ This dev branch should be deleted after merging to main.
🧠 Summary
A new Dorado Basecalling Workflow, a GPU-accelerated pipeline for basecalling Oxford Nanopore
POD5
files. The workflow includes optional automatic model selection, SAM-to-BAM conversion, and demultiplexing into unique barcode fastq files, with outputs uploaded to a new user defined Terra table for further downstream analysis.⚡ Impacted Workflows/Tasks
This is a new workflow that does not impact any other workflows
This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No
🛠️ Changes
This PR introduces the following changes:
use_auto_model
flag for automatic model selection.sup
,hac
,fast
).⚙️ Algorithm
POD5
files to SAM using GPU acceleration. Uses a new Dorado Staph-B Docker image v0.80https://github.com/StaPH-B/docker-builds/tree/master/dorado/0.8.0
➡️ Inputs
sup
,hac
,fast
).⬅️ Outputs
🧪 Testing
POD5
inputs and GPU resources.Test 1. With 9 Rabies pod5 files from 2 barcodes (manual model)
-https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/889322c2-19f0-4092-ac7f-4863e676b28a
Test 2. 24 pod5 files from 2 barcodes (manual model)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/9bef28ea-82ba-4406-8545-f32de7e07e02
test 3. 24 files from 2 barcodes (auto mode)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/cead789e-c737-4541-a6ed-d9b907493ee1
output terra table example
Suggested Scenarios for Reviewer to Test
use_auto_model
flag enabled.dorado_model
path and confirm outputs.kit_name
) to confirm error handling.🔬 Final Developer Checklist
🎯 Reviewer Checklist