-
Notifications
You must be signed in to change notification settings - Fork 514
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added latest version of STAR-Fusion as a directory.
- Loading branch information
Showing
12 changed files
with
1,553 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
Copyright (c) 2015, STAR-Fusion | ||
All rights reserved. | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are met: | ||
|
||
* Redistributions of source code must retain the above copyright notice, this | ||
list of conditions and the following disclaimer. | ||
|
||
* Redistributions in binary form must reproduce the above copyright notice, | ||
this list of conditions and the following disclaimer in the documentation | ||
and/or other materials provided with the distribution. | ||
|
||
* Neither the name of the {organization} nor the names of its | ||
contributors may be used to endorse or promote products derived from | ||
this software without specific prior written permission. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | ||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | ||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | ||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | ||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
# STAR-Fusion | ||
|
||
STAR-Fusion further processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set (using a GTF file, ideally the same annotation file used during the STAR genome index building process during the intial STAR setup). | ||
|
||
|
||
STAR should be run using options that are well suited to fusion read detection. An example of settings similar to those used in the landmark publication "The landscape of kinase fusions in cancer" (PMID: 25204415) by Stransky et al., Nat Commun 2014 are as follows: | ||
|
||
``` | ||
STAR --genomeDir Hg19.fa_star_index \ | ||
--readFilesIn left.fq right.fq \ | ||
--outSAMstrandField intronMotif \ | ||
--outFilterIntronMotifs RemoveNoncanonicalUnannotated \ | ||
--outReadsUnmapped None --chimSegmentMin 15 \ | ||
--chimJunctionOverhangMin 15 \ | ||
--alignMatesGapMax 200000 \ | ||
--alignIntronMax 200000 \ | ||
--outSAMtype BAM SortedByCoordinate | ||
``` | ||
|
||
The output from running star will include two primary output files that contain the junction and spanning read information (see STAR documentation for precise details). | ||
|
||
Chimeric.out.junction : contains junction reads. | ||
Chimeric.out.sam : contains alignments for fusion-spanning reads. | ||
|
||
|
||
## Installation Requirements | ||
|
||
|
||
STAR-Fusion requires the following Perl module from CPAN : Set/IntervalTree.pm | ||
found here: | ||
http://search.cpan.org/~benbooth/Set-IntervalTree-0.02/lib/Set/IntervalTree.pm | ||
|
||
A typical perl module installation may involve: | ||
perl -MCPAN -e shell | ||
install Set::IntervalTree | ||
|
||
*This CPAN module tends to install trouble-free on Linux. Note, if you have trouble installing Set::IntervalTree on Mac OS X (as I did), try the following: download the tarball from the above, run the perl Makefile.pl, then edit the generated 'Makefile' and remove all occurrences of '-arch i386'. Then try 'make', 'make test', and finally 'make install'. | ||
|
||
|
||
## Running STAR-Fusion | ||
|
||
Run STAR-Fusion like so, using these two files above. (Note, specify -G ref_annot.gtf if you choose to use a different annotation set than that included and used by default (gencode.v19 in the resources/ folder) | ||
|
||
STAR-Fusion -S Chimeric.out.sam -J Chimeric.out.junction | ||
|
||
|
||
The output from STAR-Fusion is found as a tab-delimited file named 'star-fusion.fusion_candidates.txt', and has the following format: | ||
|
||
``` | ||
#fusion_name JunctionReads SpanningFrags LeftGene LeftBreakpoint LeftDistFromRefExonSplice RightGene RightBreakpoint RightDistFromRefExonSplice | ||
FIP1L1--PDGFRA 98 13 FIP1L1^ENSG00000145216.11 chr4:54292132:+ 0 PDGFRA^ENSG00000134853.7 chr4:55141092:+ 84 | ||
BRD4--NUTM1 7 2 BRD4^ENSG00000141867.13 chr19:15364963:- 0 NUTM1^ENSG00000184507.11 chr15:34640170:+ 0 | ||
EWSR1--FLI1 5 2 EWSR1^ENSG00000182944.13 chr22:29683123:+ 0 FLI1^ENSG00000151702.12 chr11:128677075:+ 0 | ||
GOPC--ROS1 82 36 GOPC^ENSG00000047932.9 chr6:117888017:- 0 ROS1^ENSG00000047936.6 chr6:117642557:- 0 | ||
ETV6--NTRK3 8 3 ETV6^ENSG00000139083.6 chr12:12022903:+ 0 NTRK3^ENSG00000140538.12 chr15:88483984:- 0 | ||
FGFR3--TACC3 221 372 FGFR3^ENSG00000068078.13 chr4:1808661:+ 0 TACC3^ENSG00000013810.14 chr4:1729704:+ 269 | ||
EWSR1--ATF1 8 3 EWSR1^ENSG00000182944.13 chr22:29683123:+ 0 ATF1^ENSG00000123268.4 chr12:51208063:+ 0 | ||
HOOK3--RET 9 2 HOOK3^ENSG00000168172.4 chr8:42823357:+ 0 RET^ENSG00000165731.13 chr10:43612032:+ 0 | ||
CD74--ROS1 5 0 CD74^ENSG00000019582.10 chr5:149784243:- 0 ROS1^ENSG00000047936.6 chr6:117645578:- 0 | ||
TMPRSS2--ETV1 10 3 TMPRSS2^ENSG00000184012.7 chr21:42866302:- 19 ETV1^ENSG00000006468.9 chr7:13975463:- 58 | ||
AKAP9--BRAF 4 4 AKAP9^ENSG00000127914.12 chr7:91632549:+ 0 BRAF^ENSG00000157764.8 chr7:140487384:- 0 | ||
... | ||
``` | ||
|
||
Note, these fusion candidates are derived solely on mapping the STAR outputs to the reference annotations. Paralogous genes are notorious for showing up as false-positive fusion candidates. Additional filtering tools, although not included now, will be made available soon. Even without additional filtering, STAR-Fusion provides fusion detection accuracy that is on par with the very best available fusion predictors, and is one of the most efficient. | ||
|
||
|
||
## Parameterization | ||
|
||
STAR-Fusion will report all candidates having at least 1 junction read where the breakpoints match up precisely with reference exon junctions of two different genes. | ||
|
||
For those breakpoints that do not precisely match at reference exon junctions, the breakpoint fusion read support must be at least --min_novel_junction_support (default 10 reads). | ||
|
||
In the case where multiple candidate fusion breakpoints are reported, only those breakpoints having at least --min_alt_pct_junction (default 10%) of the dominant isoform junction support will be reported. | ||
|
||
Finally, it is worth noting that the counts of spanning fragments are entirely non-overlapping with the counts of the breakpoint junction reads. That is, no spanning fragment (from Chimeric.out.sam) is counted if it contains a read that is reported as evidence in the breakpoint junction candidate data (from Chimeric.out.junction). | ||
|
||
|
||
|
||
## Example data and execution: | ||
|
||
In the included test/ directory, you'll find a 'runMe.sh' script along with a data/ subdirectory. The data/ subdirectory contains example fusion and spanning data generated from running STAR, in addition to a reference annotation file for gencode v19. Note, the reference GTF file contains only the 'exon' records instead of all lines from the original gencode annotation file; this speeds up parsing of the file and keeps the file size relatively small for including in this package. | ||
|
||
In this test/ directory, Run the sample execution like so: | ||
|
||
./runMe.sh | ||
|
||
which simply runs: | ||
|
||
../STAR-Fusion -S Chimeric.out.sam.gz -J Chimeric.out.junction.gz | ||
|
||
and you'll find the output file 'star-fusion.fusion_candidates.txt' containing the fusion candidates in the format described above. | ||
|
||
|
||
|
||
###################### | ||
## Acknowledgements ## | ||
###################### | ||
|
||
This effort was largely inspired by earlier work done by Nicolas Stransky and discussions with Daniel Nicorici. | ||
|
||
STAR-Fusion is contributed by Brian Haas, Broad Institute, 2015 | ||
|
Oops, something went wrong.