Distinguishing cell types and cell states is one of the fundamental questions in single-cell studies. Meanwhile, exploring the lineage relations among cells and finding the path and critical points in the cell fate transition are also of great importance.
Existing unsupervised clustering methods and lineage trajectory reconstruction methods often face several challenges such as clustering data of arbitrary shapes, tracking precise trajectories and identifying critical points. Certain adaptive landscape approach, which constructs a pseudo-energy landscape of the dynamical system, may be used to explore such problems. However, algorithms based on rigorous metastability theory for constructing the landscape of individual cells are still lacking. Thus, we propose Markov hierarchical clustering algorithm (MarkovHC), which reconstructs multi-scale pseudo-energy landscape by exploiting underlying metastability structure in an exponentially perturbed Markov chain. A Markov process describes the random walk of a hypothetically traveling cell in the corresponding pseudo-energy landscape over possible gene expression states. Technically, MarkovHC integrates tasks of cell classification, trajectory reconstruction, and critical point identification in a single theoretical framework consistent with topological data analysis (TDA).
In addition to the algorithm development and simulation tests, we also applied MarkovHC to diverse types of real biological data: single-cell RNA-Seq data, cytometry data, and single-cell ATAC-Seq data. Remarkably, when applying to single-cell RNA-Seq data of human ESC derived progenitor cells, MarkovHC could not only successfully identify known cell types, but also discover new cell types and stages. In addition, when using MarkovHC to analyze single-cell RNA-Seq data of human preimplantation embryos in early development, the hierarchical structure of the lineage trajectories were faithfully reconstituted. Furthermore, the critical points representing important stage transitions had also been identified by MarkovHC from early gastric cancer data.
In summary, these results demonstrate that MarkovHC is a powerful tool based on rigorous metastability theory to explore hierarchical structures of biological data, identify cell population (basin) and critical point (stage transition), and track lineage trajectory (differentiation path).