Abstract:Single cell RNA-seq (scRNA-seq) experiments suffer from a range of characteristic technical biases, such as dropouts (zero or near zero counts) and high variance. Current analysis methods rely on imputing missing values by various means of local averaging or regression, often amplifying biases inherent in the data. We present netSmooth, a network-diffusion based method that uses priors for the covariance structure of gene expression profiles on scRNA-seq experiments in order to smooth expression values. We dem… Show more
“…Many methods have been developed for computational doublet detection (DePasquale et al, 2018; Kang et al, 2018; McGinnis et al, 2018; Wolock et al, 2018), which can be applied to the sketch to remove these potential sources of confounding variation. We also note that more advanced quality control methods, including those for normalization (Bacher et al, 2017; Lun et al, 2016b; Vallejos et al, 2017), highly variable gene filtering (Yip et al, 2018), and imputation (Van Dijk et al, 2018; Li and Li, 2018; Ronen and Akalin, 2018) can naturally be applied to a geometric sketch before further analysis.…”
Large-scale single-cell RNA-sequencing (scRNA-seq) studies that profile hundreds of thousands of cells are becoming increasingly common, overwhelming existing analysis pipelines. Here, we describe how to enhance and accelerate single-cell data analysis by summarizing the transcriptomic heterogeneity within a data set using a small subset of cells, which we refer to as a geometric sketch. Our sketches provide more comprehensive visualization of transcriptional diversity, capture rare cell types with high sensitivity, and accurately reveal biological cell types via clustering. Our sketch of umbilical cord blood cells uncovers a rare subpopulation of inflammatory macrophages, which we experimentally validated in vitro. The construction of our sketches is extremely fast, which enabled us to accelerate other crucial resource-intensive tasks such as scRNA-seq data integration. We anticipate that our algorithm will become an 42 in a matter of minutes and with an asymptotic runtime that is close to linear in the size of the data 43 set. We empirically demonstrate that our algorithm produces sketches that more evenly represent 44 the transcriptional space covered by the data. We further show that our sketches enhance and 45 5 Preprint. Work in progress. accelerate downstream analyses by preserving rare cell types, producing visualizations that 46 broadly capture transcriptomic heterogeneity, facilitating the identification of cell types via 47 131 transcriptional variability within a data set, allowing researchers to more easily gain insight into 132 rarer transcriptional states. 133 Rare Cell Types Are Better Preserved Within Geometric Sketches 134 As suggested by the above results, one of the key advantages of our algorithm is that it naturally 135 increases the representation of rare cell types with sufficient transcriptomic heterogeneity in the 136 subsampled data. Using the four data sets mentioned above, which include cell type labels 137 157 clustering algorithm (Blondel et al., 2008). Then, we transferred cluster labels to the rest of the 158 data set via k-nearest-neighbor classification and assessed the agreement between our 159 unsupervised cluster labels and the biological cell type labels provided by the original studies 160
“…Many methods have been developed for computational doublet detection (DePasquale et al, 2018; Kang et al, 2018; McGinnis et al, 2018; Wolock et al, 2018), which can be applied to the sketch to remove these potential sources of confounding variation. We also note that more advanced quality control methods, including those for normalization (Bacher et al, 2017; Lun et al, 2016b; Vallejos et al, 2017), highly variable gene filtering (Yip et al, 2018), and imputation (Van Dijk et al, 2018; Li and Li, 2018; Ronen and Akalin, 2018) can naturally be applied to a geometric sketch before further analysis.…”
Large-scale single-cell RNA-sequencing (scRNA-seq) studies that profile hundreds of thousands of cells are becoming increasingly common, overwhelming existing analysis pipelines. Here, we describe how to enhance and accelerate single-cell data analysis by summarizing the transcriptomic heterogeneity within a data set using a small subset of cells, which we refer to as a geometric sketch. Our sketches provide more comprehensive visualization of transcriptional diversity, capture rare cell types with high sensitivity, and accurately reveal biological cell types via clustering. Our sketch of umbilical cord blood cells uncovers a rare subpopulation of inflammatory macrophages, which we experimentally validated in vitro. The construction of our sketches is extremely fast, which enabled us to accelerate other crucial resource-intensive tasks such as scRNA-seq data integration. We anticipate that our algorithm will become an 42 in a matter of minutes and with an asymptotic runtime that is close to linear in the size of the data 43 set. We empirically demonstrate that our algorithm produces sketches that more evenly represent 44 the transcriptional space covered by the data. We further show that our sketches enhance and 45 5 Preprint. Work in progress. accelerate downstream analyses by preserving rare cell types, producing visualizations that 46 broadly capture transcriptomic heterogeneity, facilitating the identification of cell types via 47 131 transcriptional variability within a data set, allowing researchers to more easily gain insight into 132 rarer transcriptional states. 133 Rare Cell Types Are Better Preserved Within Geometric Sketches 134 As suggested by the above results, one of the key advantages of our algorithm is that it naturally 135 increases the representation of rare cell types with sufficient transcriptomic heterogeneity in the 136 subsampled data. Using the four data sets mentioned above, which include cell type labels 137 157 clustering algorithm (Blondel et al., 2008). Then, we transferred cluster labels to the rest of the 158 data set via k-nearest-neighbor classification and assessed the agreement between our 159 unsupervised cluster labels and the biological cell type labels provided by the original studies 160
“…DrImpute (Kwak et al , 2017) is a clustering-based method and uses a consensus strategy: it estimates a value with several cluster priors or distance matrices and then imputes by aggregation. As the low quality of the scRNA-seq datasets continues to be a bottleneck while the measurable cell counts keep increasing, the demand for faster and scalable imputation methods also keeps increasing (Eraslan et al , 2018;Lin et al , 2017;Ronen and Akalin, 2018) . While some of these earlier algorithms do improve the quality of original datasets and preserve the underlying biological variance (Zhang and Zhang, 2017) , often these methods demand extensive running time, impeding their adoption in the ever increasing scRNA-seq data space.…”
BackgroundSingle-cell RNA sequencing (scRNA-seq) offers new opportunities to study gene expression of tens of thousands of single cells simultaneously. However, a significant problem of current scRNA-seq data is the large fractions of missing values or âdropoutsâ in gene counts. Incorrect handling of dropouts may affect downstream bioinformatics analysis. As the number of scRNA-seq datasets grows drastically, it is crucial to have accurate and efficient imputation methods to handle these dropouts.MethodsWe present DeepImpute, a deep neural network based imputation algorithm. The architecture of DeepImpute efficiently uses dropout layers and loss functions to learn patterns in the data, allowing for accurate imputation.ResultsOverall DeepImpute yields better accuracy than other publicly available scRNA-Seq imputation methods on experimental data, as measured by mean squared error or Pearsonâs correlation coefficient. Moreover, its efficient implementation provides significantly higher performance over the other methods as dataset size increases. Additionally, as a machine learning method, DeepImpute allows to use a subset of data to train the model and save even more computing time, without much sacrifice on the prediction accuracy.ConclusionsDeepImpute is an accurate, fast and scalable imputation tool that is suited to handle the ever increasing volume of scRNA-seq data. The package is freely available at https://github.com/lanagarmire/DeepImpute
“…The abundance of dropouts (or sparsity) is a relevant feature of single cell RNA-seq data, and can be alleviated by imputing missing values using the information from co-expressed genes, or from similar cells, with several tools developed to recover the âtrueâ expression signal ( 110 , 111 ). Interestingly, a recent comparison of several imputation methods ( 112 ) concludes that no imputation method outperforms all the others in every situation.…”
Section: Experimental and Computational Approaches For Single Cell Gementioning
The recent development of single cell gene expression technologies, and especially single cell transcriptomics, have revolutionized the way biologists and clinicians investigate organs and organisms, allowing an unprecedented level of resolution to the description of cell demographics in both healthy and diseased states. Single cell transcriptomics provide information on prevalence, heterogeneity, and gene co-expression at the individual cell level. This enables a cell-centric outlook to define intracellular gene regulatory networks and to bridge toward the definition of intercellular pathways otherwise masked in bulk analysis. The technologies have developed at a fast pace producing a multitude of different approaches, with several alternatives to choose from at any step, including single cell isolation and capturing, lysis, RNA reverse transcription and cDNA amplification, library preparation, sequencing, and computational analyses. Here, we provide guidelines for the experimental design of single cell RNA sequencing experiments, exploring the current options for the crucial steps. Furthermore, we provide a complete overview of the typical data analysis workflow, from handling the raw sequencing data to making biological inferences. Significantly, advancements in single cell transcriptomics have already contributed to outstanding exploratory and functional studies of cardiac development and disease models, as summarized in this review. In conclusion, we discuss achievable outcomes of single cell transcriptomics' applications in addressing unanswered questions and influencing future cardiac clinical applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citationsâcitations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.