BackgroundGene fusion events are a significant source of somatic variation across adult and pediatric cancers and have provided some of the most effective clinically relevant therapeutic targets, yet computational algorithms for fusion detection from RNA sequencing data show low overlap of predictions across methods. In addition, events such as polymerase read-throughs, mis-mapping due to gene homology, and fusions occurring in healthy normal tissue require stringent filtering, making it difficult for researchers and clinicians to discern gene fusions that might be true underlying oncogenic drivers of a tumor and in some cases, appropriate targets for therapy.
ResultsHere, we present annoFuse, an R package developed to annotate and identify biologically-relevant expressed gene fusions, along with highlighting recurrent novel fusions in a given cohort. We applied annoFuse to STAR-Fusion and Arriba results for 1028 pediatric brain tumor samples provided as part of the Open Pediatric Brain Tumor Atlas (OpenPBTA) Project.First, we used FusionAnnotator to identify and filter "red flag" fusions found in healthy tissues or in gene homology databases. Using annoFuse, we filtered out fusions known to be artifactual and retained high-quality fusion calls using support of at least one junction read and if there is disproportionate spanning fragment support of more than 10 reads compared to the junction read count, we removed them to remove false positives from background noise. Second, we prioritized and captured known, as well as putative oncogenic driver, fusions previously reported in TCGA, or fusions containing gene partners that are known oncogenes, tumor suppressor genes, or COSMIC genes. Finally, using annoFuse, we determined recurrent fusions across the cohort and recurrently-fused genes within each histology.
ConclusionsannoFuse provides a standardized filtering and annotation method for gene fusion calls from STAR-Fusion and Arriba by merging, filtering and prioritizing putative oncogenic fusions across large cancer datasets, as demonstrated here with the OpenPBTA dataset. We are expanding the package to be widely-applicable to other fusion algorithms, adding functionalities, and expect annoFuse to provide researchers a method for quickly evaluating and prioritizing fusions in patient tumors.
BackgroundGene fusions arise in cancer as a result of aberrant chromosomal rearrangements or defective splicing, which bring together two unrelated genes that are then expressed as a novel fusion transcript. Detection of therapeutically-targetable fusion calls is of clinical importance and computational methods are constantly being developed to detect these events in real-time.Recent comparative studies show low concordance of fusion predictions across methods (1), suggesting that many predictions may not represent true events. Additionally, transcriptional readthroughs (2), in which the polymerase machinery skips a stop codon and reads through a neighbouring gene, as well as fusions that involve non-canonical transcripts or ...