MotivationSingle-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre-processing, quality control (QC) and normalization.ResultsWe have developed the R/Bioconductor package scater to facilitate rigorous pre-processing, quality control, normalization and visualization of scRNA-seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high-quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single-cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development.Availability and ImplementationThe open-source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater.Supplementary information
Supplementary data are available at Bioinformatics online.
The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
Background
Single-cell RNA sequencing (scRNA-seq) is a powerful tool for studying complex biological systems, such as tumor heterogeneity and tissue microenvironments. However, the sources of technical and biological variation in primary solid tumor tissues and patient-derived mouse xenografts for scRNA-seq are not well understood.
Results
We use low temperature (6 °C) protease and collagenase (37 °C) to identify the transcriptional signatures associated with tissue dissociation across a diverse scRNA-seq dataset comprising 155,165 cells from patient cancer tissues, patient-derived breast cancer xenografts, and cancer cell lines. We observe substantial variation in standard quality control metrics of cell viability across conditions and tissues. From the contrast between tissue protease dissociation at 37 °C or 6 °C, we observe that collagenase digestion results in a stress response. We derive a core gene set of 512 heat shock and stress response genes, including FOS and JUN, induced by collagenase (37 °C), which are minimized by dissociation with a cold active protease (6 °C). While induction of these genes was highly conserved across all cell types, cell type-specific responses to collagenase digestion were observed in patient tissues.
Conclusions
The method and conditions of tumor dissociation influence cell yield and transcriptome state and are both tissue- and cell-type dependent. Interpretation of stress pathway expression differences in cancer single-cell studies, including components of surface immune recognition such as MHC class I, may be especially confounded. We define a core set of 512 genes that can assist with the identification of such effects in dissociated scRNA-seq experiments.
Measuring gene expression of tumor clones at single-cell resolution links functional consequences to somatic alterations. Without scalable methods to simultaneously assay DNA and RNA from the same single cell, parallel single-cell DNA and RNA measurements from independent cell populations must be mapped for genome-transcriptome association. We present clonealign, which assigns gene expression states to cancer clones using single-cell RNA and DNA sequencing independently sampled from a heterogeneous population. We apply clonealign to triple-negative breast cancer patient-derived xenografts and high-grade serous ovarian cancer cell lines and discover clone-specific dysregulated biological pathways not visible using either sequencing method alone.Electronic supplementary materialThe online version of this article (10.1186/s13059-019-1645-z) contains supplementary material, which is available to authorized users.
SummaryInduced pluripotent stem cell (iPSC)-derived dopamine neurons provide an opportunity to model Parkinson’s disease (PD), but neuronal cultures are confounded by asynchronous and heterogeneous appearance of disease phenotypes in vitro. Using high-resolution, single-cell transcriptomic analyses of iPSC-derived dopamine neurons carrying the GBA-N370S PD risk variant, we identified a progressive axis of gene expression variation leading to endoplasmic reticulum stress. Pseudotime analysis of genes differentially expressed (DE) along this axis identified the transcriptional repressor histone deacetylase 4 (HDAC4) as an upstream regulator of disease progression. HDAC4 was mislocalized to the nucleus in PD iPSC-derived dopamine neurons and repressed genes early in the disease axis, leading to late deficits in protein homeostasis. Treatment of iPSC-derived dopamine neurons with HDAC4-modulating compounds upregulated genes early in the DE axis and corrected PD-related cellular phenotypes. Our study demonstrates how single-cell transcriptomics can exploit cellular heterogeneity to reveal disease mechanisms and identify therapeutic targets.
Progress in defining genomic fitness landscapes in cancer, especially those defined by copy number alterations (CNA), has been impeded by lack of time series single cell sampling of polyclonal populations and temporal statistical models [1][2][3][4][5][6][7] . Here, we generated 42,000 genomes from multi-year time series single cell whole genome sequencing (scWGS) of breast epithelium †
Pseudotime algorithms can be employed to extract latent temporal information from cross-sectional data sets allowing dynamic biological processes to be studied in situations where the collection of time series data is challenging or prohibitive. Computational techniques have arisen from single-cell ‘omics and cancer modelling where pseudotime can be used to learn about cellular differentiation or tumour progression. However, methods to date typically implicitly assume homogeneous genetic, phenotypic or environmental backgrounds, which becomes limiting as data sets grow in size and complexity. We describe a novel statistical framework that learns how pseudotime trajectories can be modulated through covariates that encode such factors. We apply this model to both single-cell and bulk gene expression data sets and show that the approach can recover known and novel covariate-pseudotime interaction effects. This hybrid regression-latent variable model framework extends pseudotemporal modelling from its most prevalent area of single cell genomics to wider applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.