Non-flush alignments under certain conditions #93

rvosa · 2024-04-18T11:05:00Z

The msa_hmm step combines two alignments into one file. Both the ingroup and the outgroup are processed separately to figure out the revcom orientation and do the alignment. The files are then simply concatenated under the assumption that this should result in a flush alignment. It turns out that this is not the case, sometimes. To address this, the following steps need to be taken:

Firstly, the problem seems to manifest especially with short sequences. Hence, better curation under Curate input data package #85 might help mitigate this in part.
Secondly, the two data sets can be either processed in one go, e.g. by concatenating the inputs and then do the orientation and alignment across the concatenation, or by reconciling them with --mapali
But, thirdly, how could this happen in the first place? The idea was that hmmalign would obviously yield alignments with the same length if they use the same HMM and are trimmed. What gives? Probably indels?

The text was updated successfully, but these errors were encountered:

rvosa · 2024-04-19T20:13:18Z

As of now, the Odonata branch realigns the exemplars. This solves the immediate problem but we still need to understand what's happening in the Stockholm files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-flush alignments under certain conditions #93

Non-flush alignments under certain conditions #93

rvosa commented Apr 18, 2024 •

edited

Loading

rvosa commented Apr 19, 2024

Non-flush alignments under certain conditions #93

Non-flush alignments under certain conditions #93

Comments

rvosa commented Apr 18, 2024 • edited Loading

rvosa commented Apr 19, 2024

rvosa commented Apr 18, 2024 •

edited

Loading