quantification for stranded data #177

yuankunzhu · 2019-02-19T21:45:22Z

--forward-prob was hard set to 0.5, while the documentation of that argument describes as:

Probability of generating a read from the forward strand of a transcript. Set to 1 for a strand-specific protocol where all (upstream) reads are derived from the forward strand, 0 for a strand-specific protocol where all (upstream) read are derived from the reverse strand, or 0.5 for a non-strand-specific protocol. (Default: 0.5)

Should make this as a variable associated with the stranded status

actual code line: https://github.com/BD2KGenomics/toil-rnaseq/blob/master/src/toil_rnaseq/tools/quantifiers.py#L82

The text was updated successfully, but these errors were encountered:

jvivian · 2019-02-19T22:06:46Z

@yuankunzhu — to clarify, you'd like to be able to modify this setting?

yuankunzhu · 2019-02-19T22:11:35Z

ultimately, this parameter should be set up according to the lib stranded status. So if the input data is stranded, such parameter should be 1 or 0; and if it's non-stranded, then 0.5 for example.

jvivian · 2019-02-19T22:20:07Z

@yuankunzhu — I see, thank you for the explanation. I'll look into how easy / fast it is to ascertain stranded status and see if I can add it to the workflow. If you have a fast tool you can recommend that'd be appreciated.

jvivian · 2019-02-19T22:21:35Z

This tool has a strand checker: https://hartleys.github.io/QoRTs/ but only works on BAM input files.

yuankunzhu · 2019-02-19T23:18:55Z

Thanks for looking into this @jvivian. I know Salmon could do such check up too: https://salmon.readthedocs.io/en/latest/salmon.html#what-s-this-libtype

As of version 0.7.0, Salmon also has the ability to automatically infer (i.e. guess) the library type based on how the first few thousand reads map to the transcriptome. To allow Salmon to automatically infer the library type, simply provide -l A or --libType A to Salmon.

hbeale · 2019-04-01T20:29:48Z

@yuankunzhu, I looked at the Salmon note too, but it can only detect what the aligner was told the data was, not whether the sequence data itself came from a stranded or unstranded library. I'm pretty sure this will have to be a parameter based on a human's knowledge of the library prep.

"Thus, for example, if the upstream aligner has been told to perform strand-aware mapping (i.e. to ignore potential alignments that don’t map in the expected manner), but the actual library is unstranded, automatic library type detection cannot detect this. It will attempt to detect the library type that is most consistent with the alignment that are provided."

jvivian self-assigned this Feb 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantification for stranded data #177

quantification for stranded data #177

yuankunzhu commented Feb 19, 2019

jvivian commented Feb 19, 2019

yuankunzhu commented Feb 19, 2019

jvivian commented Feb 19, 2019

jvivian commented Feb 19, 2019 •

edited

Loading

yuankunzhu commented Feb 19, 2019

hbeale commented Apr 1, 2019

quantification for stranded data #177

quantification for stranded data #177

Comments

yuankunzhu commented Feb 19, 2019

jvivian commented Feb 19, 2019

yuankunzhu commented Feb 19, 2019

jvivian commented Feb 19, 2019

jvivian commented Feb 19, 2019 • edited Loading

yuankunzhu commented Feb 19, 2019

hbeale commented Apr 1, 2019

jvivian commented Feb 19, 2019 •

edited

Loading