All cancers are caused by somatic mutations. However, understanding of the biological processes generating these mutations is limited. The catalogue of somatic mutations from a cancer genome bears the signatures of the mutational processes that have been operative. Here, we analysed 4,938,362 mutations from 7,042 cancers and extracted more than 20 distinct mutational signatures. Some are present in many cancer types, notably a signature attributed to the APOBEC family of cytidine deaminases, whereas others are confined to a single class. Certain signatures are associated with age of the patient at cancer diagnosis, known mutagenic exposures or defects in DNA maintenance, but many are of cryptic origin. In addition to these genome-wide mutational signatures, hypermutation localized to small genomic regions, kataegis, is found in many cancer types. The results reveal the diversity of mutational processes underlying the development of cancer with potential implications for understanding of cancer etiology, prevention and therapy.
SummaryEmbryonic stem cell (ESC) culture conditions are important for maintaining long-term self-renewal, and they influence cellular pluripotency state. Here, we report single cell RNA-sequencing of mESCs cultured in three different conditions: serum, 2i, and the alternative ground state a2i. We find that the cellular transcriptomes of cells grown in these conditions are distinct, with 2i being the most similar to blastocyst cells and including a subpopulation resembling the two-cell embryo state. Overall levels of intercellular gene expression heterogeneity are comparable across the three conditions. However, this masks variable expression of pluripotency genes in serum cells and homogeneous expression in 2i and a2i cells. Additionally, genes related to the cell cycle are more variably expressed in the 2i and a2i conditions. Mining of our dataset for correlations in gene expression allowed us to identify additional components of the pluripotency network, including Ptma and Zfp640, illustrating its value as a resource for future discovery.
Single-cell RNA sequencing (scRNA-seq) has broad applications across biomedical research. One of the key challenges is to ensure that only single, live cells are included in downstream analysis, as the inclusion of compromised cells inevitably affects data interpretation. Here, we present a generic approach for processing scRNA-seq data and detecting low quality cells, using a curated set of over 20 biological and technical features. Our approach improves classification accuracy by over 30 % compared to traditional methods when tested on over 5,000 cells, including CD4+ T cells, bone marrow dendritic cells, and mouse embryonic stem cells.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-0888-1) contains supplementary material, which is available to authorized users.
SummarySingle-cell RNA-sequencing (scRNA-seq) facilitates identification of new cell types and gene regulatory networks as well as dissection of the kinetics of gene expression and patterns of allele-specific expression. However, to facilitate such analyses, separating biological variability from the high level of technical noise that affects scRNA-seq protocols is vital. Here we describe and validate a generative statistical model that accurately quantifies technical noise with the help of external RNA spike-ins. Applying our approach to investigate stochastic allele-specific expression in individual cells we demonstrate that a large fraction of stochastic allele-specific expression can be explained by technical noise, especially for low and moderately expressed genes: we predict that only 17.8% of stochastic allele-specific expression patterns are attributable to biological noise with the remainder due to technical noise.
The crucial capability of T cells for discrimination between self and non-self peptides is based on negative selection of developing thymocytes by medullary thymic epithelial cells (mTECs). The mTECs purge autoreactive T cells by expression of cell-type specific genes referred to as tissue-restricted antigens (TRAs). Although the autoimmune regulator (AIRE) protein is known to promote the expression of a subset of TRAs, its mechanism of action is still not fully understood. The expression of TRAs that are not under the control of AIRE also needs further characterization. Furthermore, expression patterns of TRA genes have been suggested to change over the course of mTEC development. Herein we have used single-cell RNA-sequencing to resolve patterns of TRA expression during mTEC development. Our data indicated that mTEC development consists of three distinct stages, correlating with previously described jTEC, mTEChi and mTEClo phenotypes. For each subpopulation, we have identified marker genes useful in future studies. Aire-induced TRAs were switched on during jTEC-mTEC transition and were expressed in genomic clusters, while otherwise the subsets expressed largely overlapping sets of TRAs. Moreover, population-level analysis of TRA expression frequencies suggested that such differences might not be necessary to achieve efficient thymocyte selection.
In the author list of this Article, the surname of Marcin Imielinski was misspelled as ''Imielinsk''. This has been corrected in the HTML and PDF of the original Article online.
Nature Communications 6: Article number: 8687 (2015); Published: 22 October 2015; Updated: 11 January 2016. The original version of this Article contained an error in the spelling of the author Tomislav Ilicic, which was incorrectly given as Tomislav Illicic. This has now been corrected in both the PDF and HTML versions of the Article.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.