-
Notifications
You must be signed in to change notification settings - Fork 23
write_fasta_files
write_fasta_files writes sequences from the data stream to multiple FASTA files given a specified key. All FASTA type records containing the specified key will be written to files according to the value of the key.
write_fasta_files supports gzip and bzip2 output (and wrapped output).
For more about the FASTA format:
http://en.wikipedia.org/wiki/Fasta_format
... | write_fasta_files [options]
[-? | --help] # Print full usage description.
[-k <string> | --key=<string>] # Key for seperating records and naming files.
[-d <dir!> | --directory=<dir!>] # Target directory.
[-p <string> | --prefix=<string>] # Optional prefix for file names.
[-w <int> | --wrap=<int>] # Wrap sequences to a given width.
[-x | --no_stream] # Do not emit records.
[-Z <string> | --compress=<string>] # Compress output using <gzip|bzip2>.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following FASTA entries in the file test.fasta
.
>test1
GACATCGAC
>test2
ACGACTACAGT
>test3
GCACACAGAGC
We can read in these sequences using read_fasta
and output them
to files according to the SEQ_NAME
using write_fasta_files
like this:
read_fasta -i test.fasta | write_fasta_files -d Test_dir -k SEQ_NAME
SEQ_NAME: test1
SEQ: GACATCGAC
SEQ_LEN: 9
---
SEQ_NAME: test2
SEQ: ACGACTACAGT
SEQ_LEN: 11
---
SEQ_NAME: test3
SEQ: GCACACAGAGC
SEQ_LEN: 11
---
And the resulting directory tree will look like this:
Test_dir
|-- test1.fasta
|-- test2.fasta
`-- test3.fasta
Notice that the Test_dir
must exist. One can use .
to denote the current
directory, but that is probably not a good idea.
To bin the sequences according to sequence length do:
read_fasta -i test.fna | write_fasta_files -d Test_dir -k SEQ_LEN -x
And the resulting directory tree:
Test_dir/
|-- 11.fasta
`-- 9.fasta
We can also add a prefix to the files using the -p
switch and compress output with bzip2:
read_fasta -i test.fna | write_fasta_files -d Test_dir -k SEQ_LEN -p Length -Z bzip2 -x
And the resulting directory tree:
Test_dir/
|-- Length_11.fasta.bz2
`-- Length_9.fasta.bz2
Martin Asser Hansen - Copyright (C) - All rights reserved.
October 2011
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
write_fasta_files is part of the Biopieces framework.