Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: find_orfs

Description

find_orfs is a naive ORF finder that locates open reading frames longer than a specified min_size and shorter than a specified max_size using lists of given start and stop codons. find_orfs only searches the plus strand and to search the minus strand you have to reverse complement the sequence (see examples). Per default all ORFs are output, but using the non_redundant switch only output the longest ORF when several ORFs have the same end position. find_orfs output records where S_BEG and S_END are the ORF position (0-based) in the original sequence:

REC_TYPE: ORF
SEQ_NAME: contig00001  length=1076   numreads=142
SEQ: auggcgaggaaaacguugucugcgccgaugcggguuauaccggcgucgagaagcgugccgagcauga
SEQ_LEN: 67
S_BEG: 537
S_END: 604
---

NOTE: find_orfs requires SEQ_NAME

Usage

... | find_orfs [options]

Options

[-?          | --help]                #  Print full usage description.
[-s <list>   | --start_codons=<list>] #  List of start codons           -  Default=AUG,GUG,ATG,GTG
[-S <list>   | --stop_codons=<list>]  #  List of stop codons            -  Default=UAA,UGA,UAG,TAA,TGA,TAG
[-m <uint>   | --min_size=<uint>]     #  Minimum ORF size               -  Default=50
[-M <uint>   | --max_size=<uint>]     #  Maximum ORF size               -  Default=10_000
[-n          | --non_redundant]       #  Only output non-redundant ORFs.
[-I <file!>  | --stream_in=<file!>]   #  Read input from stream file    -  Default=STDIN
[-O <file>   | --stream_out=<file>]   #  Write output to stream file    -  Default=STDOUT
[-v          | --verbose]             #  Verbose output.

Examples

To find all ORFs on the plus strand use read_fasta like this:

read_fasta -i sequence.fna | find_orfs

To find non-redundant ORFs use the -n switch:

read_fasta -i sequence.fna | find_orfs -n

To find ORFs on the minus strand you need to reverse complement the sequence first:

read_fasta -i sequence.fna | reverse_seq | complement_seq | find_orfs -n

See also

read_fasta

reverse_seq

complement_seq

find_genes

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

July 2012

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

find_orfs is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally