8Background: Recent advances in single-cell gene expression profiling technology have 9 revolutionized the understanding of molecular processes underlying developmental cell and tissue 10 differentiation, enabling the discovery of novel cell-types and molecular markers that characterize 11 developmental trajectories. Common approaches for identifying marker genes are based on pairwise 12 statistical testing for differential gene expression between cell-types in heterogeneous cell 13 populations, which is challenging due to unequal sample sizes and variance between groups resulting 14 in little statistical power and inflated type I errors. 15
Results:We developed an alternative feature extraction method, Marker gene Identification for Cell-16 type Identity (MICTI) that encodes the cell-type specific expression information to each gene in every 17 single-cell. This approach identifies features (genes) that are cell-type specific for a given cell-type 18 in heterogeneous cell population. To validate this approach, we used (i) simulated single cell RNA-19 seq data, (ii) human pancreatic islet single-cell RNA-seq data and (iii) a simulated mixture of human 20 single-cell RNA-seq data related to immune cells, particularly B cells, CD4+ memory cells, CD8+ 21 2 memory cells, dendritic cells, fibroblast cells, and lymphoblast cells. For all cases, we were able to 22 identify established cell-type-specific markers. 23
Conclusions:Our approach represents a highly efficient and fast method as an alternative to 24 differential expression analysis for molecular marker identification in heterogeneous single-cell 25 RNA-seq data. 26