SUMMARY Hotspot mutations in splicing factor genes have been recently reported at high frequency in hematological malignancies, suggesting the importance of RNA splicing in cancer. We analyzed whole-exome sequencing data across 33 tumor types in The Cancer Genome Atlas (TCGA), and we identified 119 splicing factor genes with significant non-silent mutation patterns, including mutation over-representation, recurrent loss of function (tumor suppressor-like), or hotspot mutation profile (oncogene-like). Furthermore, RNA sequencing analysis revealed altered splicing events associated with selected splicing factor mutations. In addition, we were able to identify common gene pathway profiles associated with the presence of these mutations. Our analysis suggests that somatic alteration of genes involved in the RNA-splicing process is common in cancer and may represent an underappreciated hallmark of tumorigenesis.
TCGA's RNASeq data represent one of the largest collections of cancer transcriptomes ever assembled. RNASeq technology, combined with computational tools like our SpliceSeq package, provides a comprehensive, detailed view of alternative mRNA splicing. Aberrant splicing patterns in cancers have been implicated in such processes as carcinogenesis, de-differentiation and metastasis. TCGA SpliceSeq (http://bioinformatics.mdanderson.org/TCGASpliceSeq) is a web-based resource that provides a quick, user-friendly, highly visual interface for exploring the alternative splicing patterns of TCGA tumors. Percent Spliced In (PSI) values for splice events on samples from 33 different tumor types, including available adjacent normal samples, have been loaded into TCGA SpliceSeq. Investigators can interrogate genes of interest, search for the genes that show the strongest variation between or among selected tumor types, or explore splicing pattern changes between tumor and adjacent normal samples. The interface presents intuitive graphical representations of splicing patterns, read counts and various statistical summaries, including percent spliced in. Splicing data can also be downloaded for inclusion in integrative analyses. TCGA SpliceSeq is freely available for academic, government or commercial use.
This report investigates the mechanisms by which mammalian cells coordinate DNA replication with transcription and chromatin assembly. In yeast, DNA replication initiates within nucleosome-free regions, but studies in mammalian cells have not revealed a similar relationship. Here, we have used genome-wide massively parallel sequencing to map replication initiation events, thereby creating a database of all replication initiation sites within nonrepetitive DNA in two human cell lines. Mining this database revealed that genomic regions transcribed at moderate levels were generally associated with high replication initiation frequency. In genomic regions with high rates of transcription, very few replication initiation events were detected. High-resolution mapping of replication initiation sites showed that replication initiation events were absent from transcription start sites but were highly enriched in adjacent, downstream sequences. Methylation of CpG sequences strongly affected the location of replication initiation events, whereas histone modifications had minimal effects. These observations suggest that high levels of transcription interfere with formation of pre-replication protein complexes. Data presented here identify replication initiation sites throughout the genome, providing a foundation for further analyses of DNA-replication dynamics and cell-cycle progression.
SUMMARYFor the past decade, cancer genomic studies have focused on mutations leading to splice-site disruption, overlooking those having splice-creating potential. Here, we applied a bioinformatic tool, MiSplice, for the large-scale discovery of splice-site-creating mutations (SCMs) across 8,656 TCGA tumors. We report 1,964 originally mis-annotated mutations having clear evidence of creating alternative splice junctions. TP53 and GATA3 have 26 and 18 SCMs, respectively, and ATRX has 5 from lower-grade gliomas. Mutations in 11 genes, including PARP1, BRCA1, and BAP1, were experimentally validated for splice-site-creating function. Notably, we found that neoantigens induced by SCMs are likely several folds more immunogenic compared to missense mutations, exemplified by the recurrent GATA3 SCM. Further, high expression of PD-1 and PD-L1 was observed in tumors with SCMs, suggesting candidates for immune blockade therapy. Our work highlights the importance of integrating DNA and RNA data for understanding the functional and the clinical implications of mutations in human diseases.
Summary: SpliceSeq is a resource for RNA-Seq data that provides a clear view of alternative splicing and identifies potential functional changes that result from splice variation. It displays intuitive visualizations and prioritized lists of results that highlight splicing events and their biological consequences. SpliceSeq unambiguously aligns reads to gene splice graphs, facilitating accurate analysis of large, complex transcript variants that cannot be adequately represented in other formats.Availability and implementation: SpliceSeq is freely available at http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview. The application is a Java program that can be launched via a browser or installed locally. Local installation requires MySQL and Bowtie.Contact: mryan@insilico.us.comSupplementary Information: Supplementary data are available at Bioinformatics online.
The impact of somatic missense mutation on cancer etiology and progression is often difficult to interpret. One common approach for assessing the contribution of missense mutations in carcinogenesis is to identify genes mutated with statistically nonrandom frequencies. Even given the large number of sequenced cancer samples currently available, this approach remains underpowered to detect drivers, particularly in less studied cancer types. Alternative statistical and bioinformatic approaches are needed. One approach to increase power is to focus on localized regions of increased missense mutation density or hotspot regions, rather than a whole gene or protein domain. Detecting missense mutation hotspot regions in three dimensional protein structure may also be beneficial, because linear sequence alone does not fully describe the biologically relevant organization of codons. Here, we present a novel and statistically rigorous algorithm for detecting missense mutation hotspot regions in 3D protein structures. We analyze ~3×105 mutations from The Cancer Genome Atlas (TCGA) and identify 216 tumor-type-specific hotspot regions. In addition to experimentally determined protein structures we consider high-quality structural models, which increases genomic coverage from ~5,000 to more than 15,000 genes. We provide new evidence that 3D mutation analysis has unique advantages. It enables discovery of hotspot regions in many more genes than previously shown and increases sensitivity to hotspot regions in tumor suppressor genes. While hotspot regions have long been known to exist in both tumor suppressor genes and oncogenes, we provide the first report that they have different characteristic properties in the two types of driver genes. We show how cancer researchers can use our results to link 3D protein structure and the biological functions of missense mutations in cancer, and to generate testable hypotheses about driver mechanisms. Our results are included in a new interactive website for visualizing protein structures with TCGA mutations and associated hotspot regions. Users can submit new sequence data, facilitating the visualization of mutations in a biologically relevant context.
Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features-DNA and protein sequence conservation, indel length, and occurrence in repeat regions-are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in-frame and frameshift indels (VEST-indel) as pathogenic or benign. We apply 24 features, including a new "PubMed" feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false-positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta-predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta-predictor with improved performance over any individual method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.