-
Notifications
You must be signed in to change notification settings - Fork 23
mask_seq
mask_seq masks sequences in the stream using either hard masking or soft masking (default). Hard masking is
replacing residues with corresponding quality score below a specified cutoff with a N
, while soft
is replacing such residues with lower case. The sequences are values to SEQ keys and the quality
scores are values to SCORES keys. The SCORES are encoded as ranges of ASCII characters from '@' to
'h' indicating scores from 0 to 40.
Read more here:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217/
... | mask_seq [options]
[-? | --help] # Print full usage description.
[-c <int> | --cutoff=<int>] # Cutoff used for soft masking low scoring sequence - Default=20
[-h | --hardmask] # Hard mask instead of soft mask.
[-I <file!> | --stream_in=<file!>] # Read input stream from file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output stream to file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following FASTQ entry in the file test.fq
:
@HWI-EAS157_20FFGAAXX:2:1:888:434
TTGGTCGCTCGCTCCGCGACCTCAGATCAGACGTGGGCGAT
+HWI-EAS157_20FFGAAXX:2:1:888:434
@ABCDEFGHIJKLMNOPQRSTUVWhgfedcba`_^]\[ZYX
We can read in these sequence using read_fastq and then soft mask the sequence with mask_seq like this:
read_fastq -i test.fq | mask_seq
SCORES: @ABCDEFGHIJKLMNOPQRSTUVWhgfedcba`_^]\[ZYX
SEQ: ttggtcgctcgctccgcgacCTCAGATCAGACGTGGGCGAT
SEQ_LEN: 41
SEQ_NAME: HWI-EAS157_20FFGAAXX:2:1:888:434
---
Using the -c
switch we can change the cutoff:
read_fastq -i test.fq | mask_seq -c 25
SCORES: @ABCDEFGHIJKLMNOPQRSTUVWhgfedcba`_^]\[ZYX
SEQ: ttggtcgctcgctccgcgacctcaGATCAGACGTGGGCGAt
SEQ_LEN: 41
SEQ_NAME: HWI-EAS157_20FFGAAXX:2:1:888:434
---
Using the -h
swich for hard masking:
read_fastq -i test.fq | mask_seq -h
SEQ_NAME: HWI-EAS157_20FFGAAXX:2:1:888:434
SEQ: NNNNNNNNNNNNNNNNNNNNCTCAGATCAGACGTGGGCGAT
SEQ_LEN: 41
SCORES: @ABCDEFGHIJKLMNOPQRSTUVWhgfedcba`_^]\[ZYX
---
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2010
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
mask_seq is part of the Biopieces framework.