2019
DOI: 10.1016/j.cels.2019.05.003
|View full text |Cite
|
Sign up to set email alerts
|

Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape

Abstract: Highlights d Method to subsample massive scRNA-seq datasets while preserving rare cell states d Resulting ''sketch'' accelerates clustering, visualization, and integration analyses d Highlighting rare cells helps uncover a rare subtype of inflammatory macrophages d Sketches can boost the utility of single-cell data for labs with limited resources

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

2
93
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 115 publications
(107 citation statements)
references
References 54 publications
2
93
0
Order By: Relevance
“…Therefore, the computational complexity is O (kn 2 ). scMC can analyze large-scale datasets, e.g., 100 K cells within 2 h. To reduce the run time, users can use fewer confident cells by adjusting the quantile cutoff parameter (e.g., from 0.75 to 0.9) or perform subsampling of single-cell datasets using a "geometric sketching" approach [40], which allows to maintain the transcriptomic heterogeneity within a dataset with a smaller subset of cells. More efficient methods and packages for calculating eigenvalues will significantly enhance the computational speed of scMC.…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, the computational complexity is O (kn 2 ). scMC can analyze large-scale datasets, e.g., 100 K cells within 2 h. To reduce the run time, users can use fewer confident cells by adjusting the quantile cutoff parameter (e.g., from 0.75 to 0.9) or perform subsampling of single-cell datasets using a "geometric sketching" approach [40], which allows to maintain the transcriptomic heterogeneity within a dataset with a smaller subset of cells. More efficient methods and packages for calculating eigenvalues will significantly enhance the computational speed of scMC.…”
Section: Discussionmentioning
confidence: 99%
“…In the field of flow cytometry, this has resulted in the generation of algorithms such as flowSOM (26), which implements self-organizing maps to reduce the number of observations and effectively decrease computational costs, and PhenoGraph (25), which approximates the data using KNN-graphs and subsequently clusters using the Louvain algorithm allowing the analysis of thousands of individual cells. Recently, single cell RNAseq has become increasingly popular, spurring the development of even more efficient clustering methods such as the Leiden clustering (21), dimensionality reduction techniques such as UMAP (22) and subsampling methods that preserve the topology of the original data, such as geometric sketching (20). Here, we combined these methods to perform an initial subsampling of the flow cytometric data, effectively reducing the computational cost, followed by the generation of a KNN-graph and Leiden clustering.…”
Section: Discussionmentioning
confidence: 99%
“…In addition, we applied a computational approach, based on established single cell RNAseq pipelines (19), which consists of two steps. First, in order to improve computational throughput, compensated and biexponentially transformed data is subsampled and pooled, either in a random manner or by using geometric sketching (20) which allows for enrichment of rare subpopulations, or these subsampling methods can be combined as was done in all the following analyses. Subsequently, a k-nearest neighbor (KNN) graph was constructed and Leiden clustering was performed (21) to allow identification of populations within the data, which in turn were visualized using an UMAP manifold (22) ( Figure 1C).…”
Section: The Human Postnatal Thymus Is Characterized By An Increased mentioning
confidence: 99%
“…In a future study, to resolve the limitation of D-EE on huge-scale data computation, we can accelerate D-EE by adopting either the fast Fourier transform as used in FIt-SNE, or adopting the state-of-the-art neural network framework used by net-SNE [ 28 ]. On the other hand, because a huge-scale single-cell dataset can be highly redundant, we can also select a subset of informative samples using an advanced geometric sketching tool [ 29 ] prior to application of D-EE.…”
Section: Discussionmentioning
confidence: 99%