Skip to content

Commit

Permalink
Added latest version of STAR-Fusion as a directory.
Browse files Browse the repository at this point in the history
  • Loading branch information
alexdobin committed Apr 24, 2015
1 parent 6416af7 commit 32a0920
Show file tree
Hide file tree
Showing 12 changed files with 1,553 additions and 0 deletions.
5 changes: 5 additions & 0 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@ bin: pre-compiled executables for Linux and Mac OS X
doc: documentation
extras: miscellaneous files and scripts

STAR-Fusion: fusion detection developed by Brian Haas, see https://github.com/STAR-Fusion/STAR-Fusion for details.
To populate this submodule, clone STAR with `git clone --recursive https://github.com/alexdobin/STAR`
STAR-Fusion-x.x.x: latest release of the STAR-Fusion


COMPILING FROM SOURCE:
Unzip and "cd" into source/ subdirectory.
Linux: run "make STAR" .
Expand Down
28 changes: 28 additions & 0 deletions STAR-Fusion-0.1.1/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Copyright (c) 2015, STAR-Fusion
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name of the {organization} nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

103 changes: 103 additions & 0 deletions STAR-Fusion-0.1.1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# STAR-Fusion

STAR-Fusion further processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set (using a GTF file, ideally the same annotation file used during the STAR genome index building process during the intial STAR setup).


STAR should be run using options that are well suited to fusion read detection. An example of settings similar to those used in the landmark publication "The landscape of kinase fusions in cancer" (PMID: 25204415) by Stransky et al., Nat Commun 2014 are as follows:

```
STAR --genomeDir Hg19.fa_star_index \
--readFilesIn left.fq right.fq \
--outSAMstrandField intronMotif \
--outFilterIntronMotifs RemoveNoncanonicalUnannotated \
--outReadsUnmapped None --chimSegmentMin 15 \
--chimJunctionOverhangMin 15 \
--alignMatesGapMax 200000 \
--alignIntronMax 200000 \
--outSAMtype BAM SortedByCoordinate
```

The output from running star will include two primary output files that contain the junction and spanning read information (see STAR documentation for precise details).

Chimeric.out.junction : contains junction reads.
Chimeric.out.sam : contains alignments for fusion-spanning reads.


## Installation Requirements


STAR-Fusion requires the following Perl module from CPAN : Set/IntervalTree.pm
found here:
http://search.cpan.org/~benbooth/Set-IntervalTree-0.02/lib/Set/IntervalTree.pm

A typical perl module installation may involve:
perl -MCPAN -e shell
install Set::IntervalTree

*This CPAN module tends to install trouble-free on Linux. Note, if you have trouble installing Set::IntervalTree on Mac OS X (as I did), try the following: download the tarball from the above, run the perl Makefile.pl, then edit the generated 'Makefile' and remove all occurrences of '-arch i386'. Then try 'make', 'make test', and finally 'make install'.


## Running STAR-Fusion

Run STAR-Fusion like so, using these two files above. (Note, specify -G ref_annot.gtf if you choose to use a different annotation set than that included and used by default (gencode.v19 in the resources/ folder)

STAR-Fusion -S Chimeric.out.sam -J Chimeric.out.junction


The output from STAR-Fusion is found as a tab-delimited file named 'star-fusion.fusion_candidates.txt', and has the following format:

```
#fusion_name JunctionReads SpanningFrags LeftGene LeftBreakpoint LeftDistFromRefExonSplice RightGene RightBreakpoint RightDistFromRefExonSplice
FIP1L1--PDGFRA 98 13 FIP1L1^ENSG00000145216.11 chr4:54292132:+ 0 PDGFRA^ENSG00000134853.7 chr4:55141092:+ 84
BRD4--NUTM1 7 2 BRD4^ENSG00000141867.13 chr19:15364963:- 0 NUTM1^ENSG00000184507.11 chr15:34640170:+ 0
EWSR1--FLI1 5 2 EWSR1^ENSG00000182944.13 chr22:29683123:+ 0 FLI1^ENSG00000151702.12 chr11:128677075:+ 0
GOPC--ROS1 82 36 GOPC^ENSG00000047932.9 chr6:117888017:- 0 ROS1^ENSG00000047936.6 chr6:117642557:- 0
ETV6--NTRK3 8 3 ETV6^ENSG00000139083.6 chr12:12022903:+ 0 NTRK3^ENSG00000140538.12 chr15:88483984:- 0
FGFR3--TACC3 221 372 FGFR3^ENSG00000068078.13 chr4:1808661:+ 0 TACC3^ENSG00000013810.14 chr4:1729704:+ 269
EWSR1--ATF1 8 3 EWSR1^ENSG00000182944.13 chr22:29683123:+ 0 ATF1^ENSG00000123268.4 chr12:51208063:+ 0
HOOK3--RET 9 2 HOOK3^ENSG00000168172.4 chr8:42823357:+ 0 RET^ENSG00000165731.13 chr10:43612032:+ 0
CD74--ROS1 5 0 CD74^ENSG00000019582.10 chr5:149784243:- 0 ROS1^ENSG00000047936.6 chr6:117645578:- 0
TMPRSS2--ETV1 10 3 TMPRSS2^ENSG00000184012.7 chr21:42866302:- 19 ETV1^ENSG00000006468.9 chr7:13975463:- 58
AKAP9--BRAF 4 4 AKAP9^ENSG00000127914.12 chr7:91632549:+ 0 BRAF^ENSG00000157764.8 chr7:140487384:- 0
...
```

Note, these fusion candidates are derived solely on mapping the STAR outputs to the reference annotations. Paralogous genes are notorious for showing up as false-positive fusion candidates. Additional filtering tools, although not included now, will be made available soon. Even without additional filtering, STAR-Fusion provides fusion detection accuracy that is on par with the very best available fusion predictors, and is one of the most efficient.


## Parameterization

STAR-Fusion will report all candidates having at least 1 junction read where the breakpoints match up precisely with reference exon junctions of two different genes.

For those breakpoints that do not precisely match at reference exon junctions, the breakpoint fusion read support must be at least --min_novel_junction_support (default 10 reads).

In the case where multiple candidate fusion breakpoints are reported, only those breakpoints having at least --min_alt_pct_junction (default 10%) of the dominant isoform junction support will be reported.

Finally, it is worth noting that the counts of spanning fragments are entirely non-overlapping with the counts of the breakpoint junction reads. That is, no spanning fragment (from Chimeric.out.sam) is counted if it contains a read that is reported as evidence in the breakpoint junction candidate data (from Chimeric.out.junction).



## Example data and execution:

In the included test/ directory, you'll find a 'runMe.sh' script along with a data/ subdirectory. The data/ subdirectory contains example fusion and spanning data generated from running STAR, in addition to a reference annotation file for gencode v19. Note, the reference GTF file contains only the 'exon' records instead of all lines from the original gencode annotation file; this speeds up parsing of the file and keeps the file size relatively small for including in this package.

In this test/ directory, Run the sample execution like so:

./runMe.sh

which simply runs:

../STAR-Fusion -S Chimeric.out.sam.gz -J Chimeric.out.junction.gz

and you'll find the output file 'star-fusion.fusion_candidates.txt' containing the fusion candidates in the format described above.



######################
## Acknowledgements ##
######################

This effort was largely inspired by earlier work done by Nicolas Stransky and discussions with Daniel Nicorici.

STAR-Fusion is contributed by Brian Haas, Broad Institute, 2015

Loading

0 comments on commit 32a0920

Please sign in to comment.