Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: join_seq

Description

join_seq joins sequences in the stream. This is done by locating all records with a SEQ key and concatenate the values of these into a new record with REC_TYPE: JOIN. The SEQ_NAME of this record is the first SEQ_NAME encountered (if encountered).

Usage

... | join_seq [options]

Options

[-?          | --help]               #  Print full usage description.
[-d <string> | --delimiter=<string>] #  Delimiter used for joining sequences -  Default=""
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file          -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file          -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following sequence entries in FASTA format in the file test.fna:

>test1
aaaa
>test2
tttt
>test3
cccc
>test4
gggg

To join the sequences use read_fasta:

read_fasta -i test.fna | join_seq

SEQ_NAME: test1
SEQ: aaaa
SEQ_LEN: 4
---
SEQ_NAME: test2
SEQ: tttt
SEQ_LEN: 4
---
SEQ_NAME: test3
SEQ: cccc
SEQ_LEN: 4
---
SEQ_NAME: test4
SEQ: gggg
SEQ_LEN: 4
---
SEQ_NAME: test1
SEQ: aaaattttccccgggg
SEQ_LEN: 16
REC_TYPE: JOIN
---

To use a different delimiter use the -d switch:

read_fasta -i test.fna | join_seq -d X

SEQ_NAME: test1
SEQ: aaaa
SEQ_LEN: 4
---
SEQ_NAME: test2
SEQ: tttt
SEQ_LEN: 4
---
SEQ_NAME: test3
SEQ: cccc
SEQ_LEN: 4
---
SEQ_NAME: test4
SEQ: gggg
SEQ_LEN: 4
---
SEQ_NAME: test1
SEQ: aaaaXttttXccccXgggg
SEQ_LEN: 19
REC_TYPE: JOIN
---

See also

read_fasta

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

December 2010

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

join_seq is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally