netSmooth: Network-smoothing based imputation for single cell RNA-seq

Ronen, Jonathan; Akalin, Altuna

doi:10.12688/f1000research.13511.3

Cited by 60 publications

(39 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many methods have been developed for computational doublet detection (DePasquale et al, 2018; Kang et al, 2018; McGinnis et al, 2018; Wolock et al, 2018), which can be applied to the sketch to remove these potential sources of confounding variation. We also note that more advanced quality control methods, including those for normalization (Bacher et al, 2017; Lun et al, 2016b; Vallejos et al, 2017), highly variable gene filtering (Yip et al, 2018), and imputation (Van Dijk et al, 2018; Li and Li, 2018; Ronen and Akalin, 2018) can naturally be applied to a geometric sketch before further analysis.…”

Section: Discussionmentioning

confidence: 99%

Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape

Hie

Cho

DeMeo

et al. 2019

Preprint

View full text Add to dashboard Cite

Large-scale single-cell RNA-sequencing (scRNA-seq) studies that profile hundreds of thousands of cells are becoming increasingly common, overwhelming existing analysis pipelines. Here, we describe how to enhance and accelerate single-cell data analysis by summarizing the transcriptomic heterogeneity within a data set using a small subset of cells, which we refer to as a geometric sketch. Our sketches provide more comprehensive visualization of transcriptional diversity, capture rare cell types with high sensitivity, and accurately reveal biological cell types via clustering. Our sketch of umbilical cord blood cells uncovers a rare subpopulation of inflammatory macrophages, which we experimentally validated in vitro. The construction of our sketches is extremely fast, which enabled us to accelerate other crucial resource-intensive tasks such as scRNA-seq data integration. We anticipate that our algorithm will become an 42 in a matter of minutes and with an asymptotic runtime that is close to linear in the size of the data 43 set. We empirically demonstrate that our algorithm produces sketches that more evenly represent 44 the transcriptional space covered by the data. We further show that our sketches enhance and 45 5 Preprint. Work in progress. accelerate downstream analyses by preserving rare cell types, producing visualizations that 46 broadly capture transcriptomic heterogeneity, facilitating the identification of cell types via 47 131 transcriptional variability within a data set, allowing researchers to more easily gain insight into 132 rarer transcriptional states. 133 Rare Cell Types Are Better Preserved Within Geometric Sketches 134 As suggested by the above results, one of the key advantages of our algorithm is that it naturally 135 increases the representation of rare cell types with sufficient transcriptomic heterogeneity in the 136 subsampled data. Using the four data sets mentioned above, which include cell type labels 137 157 clustering algorithm (Blondel et al., 2008). Then, we transferred cluster labels to the rest of the 158 data set via k-nearest-neighbor classification and assessed the agreement between our 159 unsupervised cluster labels and the biological cell type labels provided by the original studies 160

show abstract

Section: Discussionmentioning

confidence: 99%

Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape

Hie

Cho

DeMeo

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…DrImpute (Kwak et al , 2017) is a clustering-based method and uses a consensus strategy: it estimates a value with several cluster priors or distance matrices and then imputes by aggregation. As the low quality of the scRNA-seq datasets continues to be a bottleneck while the measurable cell counts keep increasing, the demand for faster and scalable imputation methods also keeps increasing (Eraslan et al , 2018;Lin et al , 2017;Ronen and Akalin, 2018) . While some of these earlier algorithms do improve the quality of original datasets and preserve the underlying biological variance (Zhang and Zhang, 2017) , often these methods demand extensive running time, impeding their adoption in the ever increasing scRNA-seq data space.…”

Section: Introductionmentioning

confidence: 99%

DeepImpute: an accurate, fast and scalable deep neural network method to impute single-cell RNA-Seq data

Arisdakessian

Poirion

Yunits

et al. 2018

Preprint

View full text Add to dashboard Cite

BackgroundSingle-cell RNA sequencing (scRNA-seq) offers new opportunities to study gene expression of tens of thousands of single cells simultaneously. However, a significant problem of current scRNA-seq data is the large fractions of missing values or “dropouts” in gene counts. Incorrect handling of dropouts may affect downstream bioinformatics analysis. As the number of scRNA-seq datasets grows drastically, it is crucial to have accurate and efficient imputation methods to handle these dropouts.MethodsWe present DeepImpute, a deep neural network based imputation algorithm. The architecture of DeepImpute efficiently uses dropout layers and loss functions to learn patterns in the data, allowing for accurate imputation.ResultsOverall DeepImpute yields better accuracy than other publicly available scRNA-Seq imputation methods on experimental data, as measured by mean squared error or Pearson’s correlation coefficient. Moreover, its efficient implementation provides significantly higher performance over the other methods as dataset size increases. Additionally, as a machine learning method, DeepImpute allows to use a subset of data to train the model and save even more computing time, without much sacrifice on the prediction accuracy.ConclusionsDeepImpute is an accurate, fast and scalable imputation tool that is suited to handle the ever increasing volume of scRNA-seq data. The package is freely available at https://github.com/lanagarmire/DeepImpute

show abstract

“…The abundance of dropouts (or sparsity) is a relevant feature of single cell RNA-seq data, and can be alleviated by imputing missing values using the information from co-expressed genes, or from similar cells, with several tools developed to recover the “true” expression signal ( 110 , 111 ). Interestingly, a recent comparison of several imputation methods ( 112 ) concludes that no imputation method outperforms all the others in every situation.…”

Section: Experimental and Computational Approaches For Single Cell Gementioning

confidence: 99%

Single Cell Gene Expression to Understand the Dynamic Architecture of the Heart

Massaia

Chaves

Samari

et al. 2018

Front. Cardiovasc. Med.

View full text Add to dashboard Cite

The recent development of single cell gene expression technologies, and especially single cell transcriptomics, have revolutionized the way biologists and clinicians investigate organs and organisms, allowing an unprecedented level of resolution to the description of cell demographics in both healthy and diseased states. Single cell transcriptomics provide information on prevalence, heterogeneity, and gene co-expression at the individual cell level. This enables a cell-centric outlook to define intracellular gene regulatory networks and to bridge toward the definition of intercellular pathways otherwise masked in bulk analysis. The technologies have developed at a fast pace producing a multitude of different approaches, with several alternatives to choose from at any step, including single cell isolation and capturing, lysis, RNA reverse transcription and cDNA amplification, library preparation, sequencing, and computational analyses. Here, we provide guidelines for the experimental design of single cell RNA sequencing experiments, exploring the current options for the crucial steps. Furthermore, we provide a complete overview of the typical data analysis workflow, from handling the raw sequencing data to making biological inferences. Significantly, advancements in single cell transcriptomics have already contributed to outstanding exploratory and functional studies of cardiac development and disease models, as summarized in this review. In conclusion, we discuss achievable outcomes of single cell transcriptomics' applications in addressing unanswered questions and influencing future cardiac clinical applications.

show abstract

netSmooth: Network-smoothing based imputation for single cell RNA-seq

Cited by 60 publications

References 24 publications

Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape

Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape

DeepImpute: an accurate, fast and scalable deep neural network method to impute single-cell RNA-Seq data

Single Cell Gene Expression to Understand the Dynamic Architecture of the Heart

Contact Info

Product

Resources

About