-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion for [--rc_identity_threshold] #2
Comments
Implementing this during initial clustering is quite an undertaking. However, would it be sufficient for your purpose to do semi-global alignment in the reverse complement step and check the identity based on this semi-global alignment? This should have the same outcome any is much easier to implement. |
Yes, that would be awesome! |
To comment further on this, it is actually nice to have an option to fuse clusters which are highly identical when comparing seq1 sense vs seq2 sense rather then only comparing seq1 and reverse complement seq2 in the detect_reverse_complement script in the consensus.py module. I was getting separate clusters because the primer regions might be ambiguous due to low coverage and low sequencing quality/higher degree of mutations. Thus, I would get multiple clusters which were highly similar when comparing sense vs sense (95%+ identity, only differing in primer regions). This is something which is easy to add (I did it a bit ugly here but it works):
|
Hello Kristoffer,
I ran
NGSpeciesID
with the following command:And everything worked nicely but in some cases it creates two separate consensus sequences in opposite directions that should have been joined by
--rc_identity_threshold 0.85
. For example in the picture, the overlapping region of both consensuses has a similarity of 98.7%Would it be too hard to implement reverse complement during the initial clustering steps to avoid this kind of result? This would be very useful for the case of fragmented amplicons (and it would save time in general, even with full length amplicons, by avoiding the separate
spoa
centering of the forward and reverse clusters just to be merged afterwards by--rc_identity_threshold
).The text was updated successfully, but these errors were encountered: