-
Notifications
You must be signed in to change notification settings - Fork 23
read_fastq
read_fastq read in sequence entries from FASTQ files. Each sequence entry consists of 4 lines:
- sequence name after @
- sequence
- quality score name after + (optional)
- quality scores in ASCII
It is possible to read in pair-end sequence data from different files using the -j
switch in such a way the
sequences become interleaved in the stream.
Quality scores are in the range of -5 to 41 encoded with ASCII characters 33 to 74 (! .. J) or 59 to 104 (; .. h) for Phred/Sanger and Solexa/Illumina(<1.8), respectively.
If no encoding is supplied analyzes the first sequence entry and tries to automagically determine what encoding was used, and validate that this encoding fits the following 1000 entries.
- sanger - base 33
- solexa - base 64
- illumina1.3 - base 64
- illumina1.5 - base 64
- illumina1.8 - base 33
The resulting records look like this:
SEQ_NAME: test
SEQ: ccccccccccccccccccccccccccccccccccccccccc
SEQ_LEN: 41
SCORES: !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHI
---
Input files may be compressed with gzip or bzip2.
For more about the FASTQ format:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217/
read_fastq [options] -i <FASTQ file(s)>
[-? | --help] # Print full usage description.
[-i <files!> | --data_in=<files!>] # Comma separated list of files or glob expression to read.
[-j <files!> | --data_in2=<files!>] # Similar to -i but for pair-end data.
[-n <uint> | --num=<uint>] # Limit number of records to read.
[-e <string> | --encoding=<string>] # Encoding <auto|base_33|base_64> - Default=auto
[-I <file> | --stream_in=<file!>] # Read input stream from file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output stream to file - Default=STDOUT
[-v | --verbose] # Verbose output.
To read all FASTQ entries in the file test.fq
do:
read_fastq -i test.fq
To read a limited number of entries use the -n
switch:
read_fastq -i test.fq -n 10
To enforce the encoding use the -e
switch:
read_fastq -i test.fq -e base_64
To read in pair-end sequence data:
read_fastq -i exp_A_1.fq,exp_B_1,exp_C_1 -j exp_A_2.fq,exp_B_2,exp_C_2
Martin Asser Hansen - Copyright (C) - All rights reserved.
October 2010
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
read_fastq is part of the Biopieces framework.