Cram support #609

zlskidmore · 2019-03-19T20:49:59Z

This should allow the rnaseq.cwl workflow be input agnostic in terms of input being sam/cram/bam. I've also incorporated the suggestion from @jasonwalker80 on using RevertSam which was not in here before.

The best solution I could come up with was to force conversion to bam format right at the start. If you were instead to try and pass SAM/CRAM/BAM to RevertSam the output file and more importantly the extension would be incorrect (though if someone can figure this out maybe that's a better solution?).

At any rate as it is now samtools is run and just converts to bam so while there are 3 input types only 1 output type averting this headache alluded to above.

Side note mgibio has 2 samtools docker images, updated at the same time. I don't know which one we use so if this is accepted those lines will have to be modified (right now it uses zlskidmore/samtools)

tmooney · 2019-03-19T21:52:25Z

definitions/subworkflows/input_to_trimmed_fastq_and_hisat_alignments.cwl

+            cram_reference: cram_reference
+        out: [bam_file]
+    revert_bam:
+        run: ../tools/revert_input.cwl


Could revert_input.cwl be run on CRAMs directly? If so seems like input_to_bam could be removed. If not, seems like it should be named revert_bam.cwl instead of revert_input.cwl to denote its specificity.

(Is this operation time and space-efficient enough that we're willing to do it on all inputs instead of only those that need it?)

The problem is the specificity, the tool in revert_input.cwl can be run on anything but you would have to use the input variable, currently bam, to grab the appropriate out file , basically would require some javascript i expect which i try to avoid in cwl. I think renaming to revert_bam.cwl is preferable from my perspective.

The flexibility in the PR does come at a computational cost for sure. I think we should run revert_input.cwl either way though so the cost is just converting whatever input to bam

chrisamiller · 2019-09-05T04:15:48Z

Does @tmooney's recent code - that allows for Fastq or Bam input to the alignment workflow - provide a path forward here? #741

Seems like could be possible to expand the sequence type to have 4 input types instead of 2:

unaligned_bam
fastq
aligned_bam
aligned_cram

Then we would add some downstream logic in the alignment steps to run RevertSam if needed. That would apply in (I think) just three different places: dna alignment, rnaseq alignment, bisulfite alignment

zlskidmore added 6 commits March 19, 2019 12:36

initial test of cram support

16fd13f

fix scatter

99f1ffb

fix typos

4060496

fixed another typo

0b793a4

forgot the -T prefix for samtools

c7ef29f

removed uneccessary file

3181f61

zlskidmore changed the base branch from cram_support to master March 19, 2019 20:56

tmooney reviewed Mar 19, 2019

View reviewed changes

bumped java heap space

1bc747a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cram support #609

Cram support #609

zlskidmore commented Mar 19, 2019

tmooney Mar 19, 2019

zlskidmore Mar 19, 2019

chrisamiller commented Sep 5, 2019

Cram support #609

Are you sure you want to change the base?

Cram support #609

Conversation

zlskidmore commented Mar 19, 2019

tmooney Mar 19, 2019

Choose a reason for hiding this comment

zlskidmore Mar 19, 2019

Choose a reason for hiding this comment

chrisamiller commented Sep 5, 2019