Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collapsing reads issue follow-up #56

Open
rtyags opened this issue Oct 17, 2022 · 0 comments
Open

collapsing reads issue follow-up #56

rtyags opened this issue Oct 17, 2022 · 0 comments

Comments

@rtyags
Copy link

rtyags commented Oct 17, 2022

Hi, please look at the following comment from a closed issue. Opening a new issue here since I haven't heard back from anyone (presumably because commenting on a closed issue doesn't automatically reopen it).

Thanks.

" As a follow up, looking at the code it seems to me that you use 20 as the threshold for this. i.e. if one end is the same, we allow the other end to be up to 20 bases away for it to still be considered a duplicate. Is that correct?

However, even in that case, I'm confused because I see multiple cases where the end is the same, the start is <20 bases away, but these are still not counted separately (i.e., they are considered duplicates) by sinto. e.g. with the following 4 reads:

A00261:525:HK77VDSX3:1:1133:17969:2613 99 chrM 9947 60 150M = 10023 226 GGTTTGACTATTTCTGTATGTCTCCATCTATTGATGAGGGTCTTACTCTTTTAGTATAAATAGTACCGTTAACTTCCAATTAACTAGTTTTGACAACATTCAAAAAAGAGTAATAAACTTCGCCTTAATTTTAATAATCAACACCCTCCT FFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:150 AS:i:150 XS:i:34 CR:Z:ACAGGCTCAGGAGGGT CY:Z:FFFFFFFFFFFFFFFF CB:Z:AAAGCAAGTGGAAACG-1 BC:Z:TCGAATTG QT:Z:FFFFFFFF RG:Z:Sample_output:MissingLibrary:1:HK77VDSX3:1
A00261:525:HK77VDSX3:1:1133:17969:2613 147 chrM 10023 60 150M = 9947 -226 CAATTAACTAGTTTTGACAACATTCAAAAAAGAGTAATAAACTTCGCCTTAATTTTAATAATCAACACCCTCCTAGCCTTACTACTAATAATTATTACATTTTGACTACCACAACTCAACGGCTACATAGAAAAATCCACCCCTTACGAG :FFFFFFFFFFFFFFFF:FFFFFF:FFFF:FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:150 AS:i:150 XS:i:0 CR:Z:ACAGGCTCAGGAGGGT CY:Z:FFFFFFFFFFFFFFFF CB:Z:AAAGCAAGTGGAAACG-1 BC:Z:TCGAATTG QT:Z:FFFFFFFF RG:Z:Sample_output:MissingLibrary:1:HK77VDSX3:1
A00261:525:HK77VDSX3:1:1370:20518:3302 99 chrM 10092 60 81M = 10092 81 CTCCTAGCCTTACTACTAATAATTATTACATTTTGACTACCACAACTCAACGGCTACATAGAAAAATCCACCCCTTACGAG FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:81 AS:i:81 XS:i:0 CR:Z:ACAGGCTCAGGAGGGT CY:Z:FFFFFF,FFFFFFFFF CB:Z:AAAGCAAGTGGAAACG-1 BC:Z:CGAGTGAT QT:Z:FFFFFFFF RG:Z:Sample_output:MissingLibrary:1:HK77VDSX3:1 TR:Z:CTGTCTCTTATACACATCTCCGAGCCCACGAGACCGAGTGATATCTCGTATGCCGTCTTCTGCTTGAAA TQ:Z:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF
A00261:525:HK77VDSX3:1:1370:20518:3302 147 chrM 10092 60 81M = 10092 -81 CTCCTAGCCTTACTACTAATAATTATTACATTTTGACTACCACAACTCAACGGCTACATAGAAAAATCCACCCCTTACGAG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:81 AS:i:81 XS:i:0 CR:Z:ACAGGCTCAGGAGGGT CY:Z:FFFFFF,FFFFFFFFF CB:Z:AAAGCAAGTGGAAACG-1 BC:Z:CGAGTGAT QT:Z:FFFFFFFF RG:Z:Sample_output:MissingLibrary:1:HK77VDSX3:1 TR:Z:CTGTCTCTTATACACATCTGACGCTGCCGACGACAGACGCGACCCTCCTGAGCCTGTGTGTAGATCTCG TQ:Z:::FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

I would have expected the following two start,end pairs to be considered separate fragments:
9950 10167
10095 10167

but sinto actually only counts the second fragment here (i.e. 10095 10167), and ignores the first. What am I missing?

Thanks
"

Originally posted by @rtyags in #48 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant