Large-scale single-cell RNA-sequencing (scRNA-seq) studies that profile hundreds of thousands of cells are becoming increasingly common, overwhelming existing analysis pipelines. Here, we describe how to enhance and accelerate single-cell data analysis by summarizing the transcriptomic heterogeneity within a data set using a small subset of cells, which we refer to as a geometric sketch. Our sketches provide more comprehensive visualization of transcriptional diversity, capture rare cell types with high sensitivity, and accurately reveal biological cell types via clustering. Our sketch of umbilical cord blood cells uncovers a rare subpopulation of inflammatory macrophages, which we experimentally validated in vitro. The construction of our sketches is extremely fast, which enabled us to accelerate other crucial resource-intensive tasks such as scRNA-seq data integration. We anticipate that our algorithm will become an 42 in a matter of minutes and with an asymptotic runtime that is close to linear in the size of the data 43 set. We empirically demonstrate that our algorithm produces sketches that more evenly represent 44 the transcriptional space covered by the data. We further show that our sketches enhance and 45 5 Preprint. Work in progress. accelerate downstream analyses by preserving rare cell types, producing visualizations that 46 broadly capture transcriptomic heterogeneity, facilitating the identification of cell types via 47 131 transcriptional variability within a data set, allowing researchers to more easily gain insight into 132 rarer transcriptional states. 133 Rare Cell Types Are Better Preserved Within Geometric Sketches 134 As suggested by the above results, one of the key advantages of our algorithm is that it naturally 135 increases the representation of rare cell types with sufficient transcriptomic heterogeneity in the 136 subsampled data. Using the four data sets mentioned above, which include cell type labels 137 157 clustering algorithm (Blondel et al., 2008). Then, we transferred cluster labels to the rest of the 158 data set via k-nearest-neighbor classification and assessed the agreement between our 159 unsupervised cluster labels and the biological cell type labels provided by the original studies 160