Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: write_fasta_files

Description

write_fasta_files writes sequences from the data stream to multiple FASTA files given a specified key. All FASTA type records containing the specified key will be written to files according to the value of the key.

write_fasta_files supports gzip and bzip2 output (and wrapped output).

For more about the FASTA format:

http://en.wikipedia.org/wiki/Fasta_format

Usage

... | write_fasta_files [options]

Options

[-?          | --help]               #  Print full usage description.
[-k <string> | --key=<string>]       #  Key for seperating records and naming files.
[-d <dir!>   | --directory=<dir!>]   #  Target directory.
[-p <string> | --prefix=<string>]    #  Optional prefix for file names.
[-w <int>    | --wrap=<int>]         #  Wrap sequences to a given width.
[-x          | --no_stream]          #  Do not emit records.
[-Z <string> | --compress=<string>]  #  Compress output using <gzip|bzip2>.
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following FASTA entries in the file test.fasta.

>test1
GACATCGAC
>test2
ACGACTACAGT
>test3
GCACACAGAGC

We can read in these sequences using read_fasta and output them to files according to the SEQ_NAME using write_fasta_files like this:

read_fasta -i test.fasta | write_fasta_files -d Test_dir -k SEQ_NAME

SEQ_NAME: test1
SEQ: GACATCGAC
SEQ_LEN: 9
---
SEQ_NAME: test2
SEQ: ACGACTACAGT
SEQ_LEN: 11
---
SEQ_NAME: test3
SEQ: GCACACAGAGC
SEQ_LEN: 11
---

And the resulting directory tree will look like this:

Test_dir
|-- test1.fasta
|-- test2.fasta
`-- test3.fasta

Notice that the Test_dir must exist. One can use . to denote the current directory, but that is probably not a good idea.

To bin the sequences according to sequence length do:

read_fasta -i test.fna | write_fasta_files -d Test_dir -k SEQ_LEN -x

And the resulting directory tree:

Test_dir/
|-- 11.fasta
`-- 9.fasta

We can also add a prefix to the files using the -p switch and compress output with bzip2:

read_fasta -i test.fna | write_fasta_files -d Test_dir -k SEQ_LEN -p Length -Z bzip2 -x

And the resulting directory tree:

Test_dir/
|-- Length_11.fasta.bz2
`-- Length_9.fasta.bz2

See also

read_fasta

write_fasta

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

October 2011

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

write_fasta_files is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally