-
Notifications
You must be signed in to change notification settings - Fork 23
invert_align
invert_align is useful to locate mismatches or other differences in an alignment
between the reference sequence (the first sequence in the alignment) and the remaining
sequences. Invertion can be 'hard' where matching residues are shown as -
or 'soft'
where matching residues are shown in lower case. In both cases, mismatches are shown as
capital letters and gaps or missing sequence is shown as _
.
... | invert_align [options]
[-? | --help] # Print full usage description.
[-s | --soft] # Use soft inversion instead of hard inversion.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the alignment in the file aln.fna
in FASTA format:
>test1
CTAGC-TTCGACT
>test2
--AGC-TTCGA--
>test3
--AGCTTTCGA--
>test4
--AG--CTCGA--
>test5
--AG--TTCGAC-
Reading the alignment using read_fasta results in:
read_fasta -i aln.fna | write_align -x
.
test1 CTAGC-TTCGACT
test2 --AGC-TTCGA--
test3 --AGCTTTCGA--
test4 --AG--CTCGA--
test5 --AG--TTCGAC-
Consensus: 50% --AG--TTCGA--
However, if we insert an instance of invert_align it is clear where the sequence differences are:
read_fasta -i aln.fna | invert_align | write_align -x
.
test1 CTAGC_TTCGACT
test2 __---------__
test3 __---T-----__
test4 __--_-C----__
test5 __--_-------_
Consensus: 50% -------------
And if we instead of hard inverting the sequence uses the -s
switch of invert_align to obtain soft
inverted alignment, where the matching residues are in lower case letters instead of represented as -
, we get:
read_fasta -i aln.fna | invert_align -s | write_align -x
.
test1 CTAGC_TTCGACT
test2 __agc_ttcga__
test3 __agcTttcga__
test4 __ag__Ctcga__
test5 __ag__ttcgac_
Consensus: 50% --AG--TTCGA--
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
invert_align is part of the Biopieces framework.