Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantification for stranded data #177

Open
yuankunzhu opened this issue Feb 19, 2019 · 6 comments
Open

quantification for stranded data #177

yuankunzhu opened this issue Feb 19, 2019 · 6 comments
Assignees

Comments

@yuankunzhu
Copy link

--forward-prob was hard set to 0.5, while the documentation of that argument describes as:

Probability of generating a read from the forward strand of a transcript. Set to 1 for a strand-specific protocol where all (upstream) reads are derived from the forward strand, 0 for a strand-specific protocol where all (upstream) read are derived from the reverse strand, or 0.5 for a non-strand-specific protocol. (Default: 0.5)

Should make this as a variable associated with the stranded status

actual code line: https://github.com/BD2KGenomics/toil-rnaseq/blob/master/src/toil_rnaseq/tools/quantifiers.py#L82

@jvivian
Copy link
Collaborator

jvivian commented Feb 19, 2019

@yuankunzhu — to clarify, you'd like to be able to modify this setting?

@jvivian jvivian self-assigned this Feb 19, 2019
@yuankunzhu
Copy link
Author

ultimately, this parameter should be set up according to the lib stranded status. So if the input data is stranded, such parameter should be 1 or 0; and if it's non-stranded, then 0.5 for example.

@jvivian
Copy link
Collaborator

jvivian commented Feb 19, 2019

@yuankunzhu — I see, thank you for the explanation. I'll look into how easy / fast it is to ascertain stranded status and see if I can add it to the workflow. If you have a fast tool you can recommend that'd be appreciated.

@jvivian
Copy link
Collaborator

jvivian commented Feb 19, 2019

This tool has a strand checker: https://hartleys.github.io/QoRTs/ but only works on BAM input files.

@yuankunzhu
Copy link
Author

Thanks for looking into this @jvivian. I know Salmon could do such check up too: https://salmon.readthedocs.io/en/latest/salmon.html#what-s-this-libtype

As of version 0.7.0, Salmon also has the ability to automatically infer (i.e. guess) the library type based on how the first few thousand reads map to the transcriptome. To allow Salmon to automatically infer the library type, simply provide -l A or --libType A to Salmon.

@hbeale
Copy link

hbeale commented Apr 1, 2019

@yuankunzhu, I looked at the Salmon note too, but it can only detect what the aligner was told the data was, not whether the sequence data itself came from a stranded or unstranded library. I'm pretty sure this will have to be a parameter based on a human's knowledge of the library prep.

"Thus, for example, if the upstream aligner has been told to perform strand-aware mapping (i.e. to ignore potential alignments that don’t map in the expected manner), but the actual library is unstranded, automatic library type detection cannot detect this. It will attempt to detect the library type that is most consistent with the alignment that are provided."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants