-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protein sequence validation #498
Comments
How about showing IDs with Z or X in the sequences
|
Thanks for you fast reply Shen Wei! Yes, that works - thanks! But I guess as a naive user I would expect a warning about such protein sequences as "Z" and "X" are not standard amino acids. |
They are allowed in protein sequences 🥲 Someone asked to add https://github.com/shenwei356/bio/blob/master/seq/alphabet.go#L34-L73
|
Hmmmm, I see the conundrum. Problem is for many sequence alignment and phylogenetic analysis programs "X" and "Z" residues are hard rejected. Maybe some kind of 'soft warning' would be good?? Nonetheless I greatly appreciate your helpful and fast responses :-)! |
How about replacing Z with Q or E, and replacing X with any residue?
|
In my case I can't replace residues with something else as these are supposed to be explicit datapoints - i.e. every position in the protein alignment should be explicitly defined (and not in the case of "X" whatever you want!). My plan is to use seqkit for this very purpose; check that all sequences in an alignment do not have undefined residues. If they do then probably remove them from the analysis! |
Hi seqkit team,
I'd like to simply validate some protein sequences but the behaviour of seqkit is not as I'd intuitively expect.....
seqkit version -u
seqkit v2.9.0
Checking new version...
You are using the latest version of seqkit
echo -e ">seq\nMFKXXXXXQLRTNKZZZZZDRTFPAD" | seqkit seq --seq-type protein -v
I would expect some warnings about a protein sequence with "X" or "Z" rather than just returning the input sequence. Am I missing something?
Any advice would be gratefully received!
The text was updated successfully, but these errors were encountered: