“…While it is useful for de novo discovery of new cell types and subtypes, unsupervised learning depends on many user-specific inputs, including which clustering algorithm to use (e.g., K-means clustering, hierarchical clustering, density-based clustering or graph-based clustering), the type of similarity or distance metric between two cells, and the number of clusters, which is a key parameter needed for many clustering algorithms. Taking into account the distinct features of scRNA-seq data, multiple cell clustering algorithms have been developed, including SNN-Cliq, which does not use conventional similarity measures but leverages the ranking of cells to construct a cell-cell graph for identifying cell clusters [244]; BiSNN-Walk, which extends SNN-Cliq and uses an iterative biclustering approach to return a ranked list of cell clusters, each associated with a set of ranked genes based on their levels of affiliation with the cluster [192]; CIDR, the first clustering method that incorporates imputation of dropout gene expression levels [125]; SC3, a widelyused ensemble method that combines multiple clustering algorithms [106]; and Seurat, which identifies cell clusters based on a shared nearest neighbor (SNN) clustering algorithm [184]. In addition to commonly used similarity metric including the Pearson correlation, Spearman correlation, Euclidean distance, other cell similarity measures can be found in, for example, [91,186].…”