-
Notifications
You must be signed in to change notification settings - Fork 23
find_orfs
find_orfs is a naive ORF finder that locates open reading frames longer than a
specified min_size
and shorter than a specified max_size
using lists of given
start and stop codons. find_orfs only searches the plus strand and to search the
minus strand you have to reverse complement the sequence (see examples). Per default
all ORFs are output, but using the non_redundant
switch only output the longest ORF
when several ORFs have the same end position. find_orfs output records where S_BEG
and S_END
are the ORF position (0-based) in the original sequence:
REC_TYPE: ORF
SEQ_NAME: contig00001 length=1076 numreads=142
SEQ: auggcgaggaaaacguugucugcgccgaugcggguuauaccggcgucgagaagcgugccgagcauga
SEQ_LEN: 67
S_BEG: 537
S_END: 604
---
NOTE: find_orfs requires SEQ_NAME
... | find_orfs [options]
[-? | --help] # Print full usage description.
[-s <list> | --start_codons=<list>] # List of start codons - Default=AUG,GUG,ATG,GTG
[-S <list> | --stop_codons=<list>] # List of stop codons - Default=UAA,UGA,UAG,TAA,TGA,TAG
[-m <uint> | --min_size=<uint>] # Minimum ORF size - Default=50
[-M <uint> | --max_size=<uint>] # Maximum ORF size - Default=10_000
[-n | --non_redundant] # Only output non-redundant ORFs.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
To find all ORFs on the plus strand use read_fasta like this:
read_fasta -i sequence.fna | find_orfs
To find non-redundant ORFs use the -n
switch:
read_fasta -i sequence.fna | find_orfs -n
To find ORFs on the minus strand you need to reverse complement the sequence first:
read_fasta -i sequence.fna | reverse_seq | complement_seq | find_orfs -n
Martin Asser Hansen - Copyright (C) - All rights reserved.
July 2012
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
find_orfs is part of the Biopieces framework.