Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-flush alignments under certain conditions #93

Open
rvosa opened this issue Apr 18, 2024 · 1 comment
Open

Non-flush alignments under certain conditions #93

rvosa opened this issue Apr 18, 2024 · 1 comment

Comments

@rvosa
Copy link
Member

rvosa commented Apr 18, 2024

The msa_hmm step combines two alignments into one file. Both the ingroup and the outgroup are processed separately to figure out the revcom orientation and do the alignment. The files are then simply concatenated under the assumption that this should result in a flush alignment. It turns out that this is not the case, sometimes. To address this, the following steps need to be taken:

  • Firstly, the problem seems to manifest especially with short sequences. Hence, better curation under Curate input data package #85 might help mitigate this in part.
  • Secondly, the two data sets can be either processed in one go, e.g. by concatenating the inputs and then do the orientation and alignment across the concatenation, or by reconciling them with --mapali
  • But, thirdly, how could this happen in the first place? The idea was that hmmalign would obviously yield alignments with the same length if they use the same HMM and are trimmed. What gives? Probably indels?
@rvosa
Copy link
Member Author

rvosa commented Apr 19, 2024

As of now, the Odonata branch realigns the exemplars. This solves the immediate problem but we still need to understand what's happening in the Stockholm files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant