-
Notifications
You must be signed in to change notification settings - Fork 23
trim_seq
trim_seq removes subquality residues from the ends of sequences in the stream based on
quality SCORES in a FASTQ type quality score string. Trimming progresses until a stretch,
specified with the --min_len
switch, is found thus preventing premature termination of
the trimming by e.g. a single good quality residue at the end. It is possible, using the
--trim
switch to indicate if the sequence should be trimmed from the left or right end,
or both.
... | trim_seq [options]
[-? | --help] # Print full usage description.
[-m <uint> | --min_qual=<uint>] # Minimum quality - Default=20
[-l <uint> | --min_len=<uint>] # Minimum stretch length - Default=3
[-t <string> | --trim=<string>] # Trim mode <both|left|right> - Default=both
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following FASTQ entry in the file test.fq
:
@test
gatcgatcgtacgagcagcatctgacgtatcgatcgttgattagttgctagctatgcagtctacgacgagcatgctagctag
+
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDChhh
To trim both ends simply do:
read_fastq -i test.fq | trim_seq
SEQ_NAME: test
SEQ: tctgacgtatcgatcgttgattagttgctagctatgcagtctacgacgagcatgctagctag
SEQ_LEN: 62
SCORES: TUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDChhh
---
Use the -m
switch to change the minimum value to discard:
read_fastq -i test.fq | trim_seq -m 25
SEQ_NAME: test
SEQ: cgtatcgatcgttgattagttgctagctatgcagtctacgacgagcatgctagctag
SEQ_LEN: 57
SCORES: YZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDChhh
---
To trim the left end only (and -t right
for right end only), do:
read_fastq -i test.fq | trim_seq -t left
SEQ_NAME: test
SEQ: tctgacgtatcgatcgttgattagttgctagctatgcagtctacgacgagcatgctagctag
SEQ_LEN: 62
SCORES: TUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDChhh
---
To increase the length of stretch of good quality residues to match, use
the -l
switch:
read_fastq -i test.fq | trim_seq -l 4
SEQ_NAME: test
SEQ: tctgacgtatcgatcgttgattagttgctagctatgcagtct
SEQ_LEN: 42
SCORES: TUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUT
---
Martin Asser Hansen - Copyright (C) - All rights reserved.
June 2010
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
trim_seq is part of the Biopieces framework.