-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cram support #609
base: master
Are you sure you want to change the base?
Cram support #609
Conversation
cram_reference: cram_reference | ||
out: [bam_file] | ||
revert_bam: | ||
run: ../tools/revert_input.cwl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could revert_input.cwl
be run on CRAMs directly? If so seems like input_to_bam
could be removed. If not, seems like it should be named revert_bam.cwl
instead of revert_input.cwl
to denote its specificity.
(Is this operation time and space-efficient enough that we're willing to do it on all inputs instead of only those that need it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is the specificity, the tool in revert_input.cwl can be run on anything but you would have to use the input variable, currently bam, to grab the appropriate out file , basically would require some javascript i expect which i try to avoid in cwl. I think renaming to revert_bam.cwl is preferable from my perspective.
The flexibility in the PR does come at a computational cost for sure. I think we should run revert_input.cwl either way though so the cost is just converting whatever input to bam
Does @tmooney's recent code - that allows for Fastq or Bam input to the alignment workflow - provide a path forward here? #741 Seems like could be possible to expand the sequence type to have 4 input types instead of 2:
Then we would add some downstream logic in the alignment steps to run RevertSam if needed. That would apply in (I think) just three different places: dna alignment, rnaseq alignment, bisulfite alignment |
This should allow the rnaseq.cwl workflow be input agnostic in terms of input being sam/cram/bam. I've also incorporated the suggestion from @jasonwalker80 on using
RevertSam
which was not in here before.The best solution I could come up with was to force conversion to bam format right at the start. If you were instead to try and pass SAM/CRAM/BAM to
RevertSam
the output file and more importantly the extension would be incorrect (though if someone can figure this out maybe that's a better solution?).At any rate as it is now samtools is run and just converts to bam so while there are 3 input types only 1 output type averting this headache alluded to above.
Side note mgibio has 2 samtools docker images, updated at the same time. I don't know which one we use so if this is accepted those lines will have to be modified (right now it uses zlskidmore/samtools)