-
Notifications
You must be signed in to change notification settings - Fork 23
join_seq
Martin Asser Hansen edited this page Oct 2, 2015
·
6 revisions
join_seq joins sequences in the stream. This is done by locating all records with a SEQ
key and concatenate the
values of these into a new record with REC_TYPE: JOIN
. The SEQ_NAME
of this record is the first SEQ_NAME
encountered
(if encountered).
... | join_seq [options]
[-? | --help] # Print full usage description.
[-d <string> | --delimiter=<string>] # Delimiter used for joining sequences - Default=""
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following sequence entries in FASTA format in the file test.fna
:
>test1
aaaa
>test2
tttt
>test3
cccc
>test4
gggg
To join the sequences use read_fasta:
read_fasta -i test.fna | join_seq
SEQ_NAME: test1
SEQ: aaaa
SEQ_LEN: 4
---
SEQ_NAME: test2
SEQ: tttt
SEQ_LEN: 4
---
SEQ_NAME: test3
SEQ: cccc
SEQ_LEN: 4
---
SEQ_NAME: test4
SEQ: gggg
SEQ_LEN: 4
---
SEQ_NAME: test1
SEQ: aaaattttccccgggg
SEQ_LEN: 16
REC_TYPE: JOIN
---
To use a different delimiter use the -d
switch:
read_fasta -i test.fna | join_seq -d X
SEQ_NAME: test1
SEQ: aaaa
SEQ_LEN: 4
---
SEQ_NAME: test2
SEQ: tttt
SEQ_LEN: 4
---
SEQ_NAME: test3
SEQ: cccc
SEQ_LEN: 4
---
SEQ_NAME: test4
SEQ: gggg
SEQ_LEN: 4
---
SEQ_NAME: test1
SEQ: aaaaXttttXccccXgggg
SEQ_LEN: 19
REC_TYPE: JOIN
---
Martin Asser Hansen - Copyright (C) - All rights reserved.
December 2010
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
join_seq is part of the Biopieces framework.