Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: trim_seq

Description

trim_seq removes subquality residues from the ends of sequences in the stream based on quality SCORES in a FASTQ type quality score string. Trimming progresses until a stretch, specified with the --min_len switch, is found thus preventing premature termination of the trimming by e.g. a single good quality residue at the end. It is possible, using the --trim switch to indicate if the sequence should be trimmed from the left or right end, or both.

Usage

... | trim_seq [options]

Options

[-?          | --help]               #  Print full usage description.
[-m <uint>   | --min_qual=<uint>]    #  Minimum quality              -  Default=20
[-l <uint>   | --min_len=<uint>]     #  Minimum stretch length       -  Default=3
[-t <string> | --trim=<string>]      #  Trim mode <both|left|right>  -  Default=both
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following FASTQ entry in the file test.fq:

@test
gatcgatcgtacgagcagcatctgacgtatcgatcgttgattagttgctagctatgcagtctacgacgagcatgctagctag
+
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDChhh

To trim both ends simply do:

read_fastq -i test.fq | trim_seq

SEQ_NAME: test
SEQ: tctgacgtatcgatcgttgattagttgctagctatgcagtctacgacgagcatgctagctag
SEQ_LEN: 62
SCORES: TUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDChhh
---

Use the -m switch to change the minimum value to discard:

read_fastq -i test.fq | trim_seq -m 25

SEQ_NAME: test
SEQ: cgtatcgatcgttgattagttgctagctatgcagtctacgacgagcatgctagctag
SEQ_LEN: 57
SCORES: YZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDChhh
---

To trim the left end only (and -t right for right end only), do:

read_fastq -i test.fq | trim_seq -t left

SEQ_NAME: test
SEQ: tctgacgtatcgatcgttgattagttgctagctatgcagtctacgacgagcatgctagctag
SEQ_LEN: 62
SCORES: TUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDChhh
---

To increase the length of stretch of good quality residues to match, use the -l switch:

read_fastq -i test.fq | trim_seq -l 4   

SEQ_NAME: test
SEQ: tctgacgtatcgatcgttgattagttgctagctatgcagtct
SEQ_LEN: 42
SCORES: TUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUT
---

See also

read_fastq

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

June 2010

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

trim_seq is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally