Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: indel_seq

Description

indel_seq introduces indels (insertions and deletions) into sequences in the stream based on either an exact number of indels, or a percentage of the sequence length. Insertions are introduced at random positions as duplication of the adjecent residues. Deletions are made at random positions.

Usage

... | mutate_seq [options]

Options

[-?         | --help]                        #  Print full usage description.
[-i <uint>  | --insertions=<uint>]           #  Number of insertions         -  Default=0
[-P <float> | --insertions_percent=<float>]  #  Percent residues to insert   -  Default=0.0
[-d <uint>  | --deletions=<uint>]            #  Number of deletions          -  Default=0
[-D <float> | --deletions_percent=<float>]   #  Percent residues to delete   -  Default=0.0
[-I <file!> | --stream_in=<file!>]           #  Read input from stream file  -  Default=STDIN
[-O <file>  | --stream_out=<file>]           #  Write output to stream file  -  Default=STDOUT
[-v         | --verbose]                     #  Verbose output.

Examples

Consider the following entry in the FASTA file test.fna:

>test
ATGTGCACATTCGACTAGCA

Read in the sequence with read_fasta:

read_fasta -i test.fna | indel_seq -i 2

SEQ: ATGTGCACCATTCGACTAGGCA
SEQ_NAME: test
SEQ_LEN: 22
---

Or in the same go delete 30% of the residues using the -D switch:

read_fasta -i test.fna | indel_seq -i 2 -D 30

SEQ: TGTGACATTGACAGCC
SEQ_NAME: test
SEQ_LEN: 16
---

And you can always re-read the original sequence and compare like this:

read_fasta -i test.fna | indel_seq -i 2 -D 30 | align_seq | write_align -x

                       .        
test        -TGTG-ACATTC-ACTA-CA
             |||| |||||| |||| ||
test        ATGTGCACATTCGACTAGCA
                     .         .

See also

mutate_seq

read_fasta

align_seq

write_align

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

June 2009

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

indel_seq is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally