Abstract:Single-cell measurement techniques can now probe gene expression in heterogeneous cell populations from the human body across a range of environmental and physiological conditions. However, new mathematical and computational methods are required to represent and analyze gene-expression changes that occur in complex mixtures of single cells as they respond to signals, drugs, or disease states. Here, we introduce a mathematical modeling platform, PopAlign, that automatically identifies subpopulations of cells wi… Show more
“…We provide a review of existing single-cell omics perturbation models separated into categories based on commonly established ways to categorize ML models (Table 1). All methods except CellOracle (Kamimoto et al, 2020) can be used for perturbations as defined by a before-and-after effect, which could include healthy versus diseased phenotypes (Buschur et al, 2020), or cross-species translation (Lotfollahi et al, 2019;Chen et al, 2020aChen et al, , 2020b. CellOracle is specific to genetic/single-target perturbations as it infers effect through propagating signal through a gene regulation network (GRN).…”
Section: Current Approaches For Perturbation Modeling In Single-cell Omicsmentioning
confidence: 99%
“…While also applicable to bulk transcriptomics (Umarov and Arner, 2020;Rampá sek et al, 2019), distribution modeling gained popularity in the single-cell field as a way to describe population shifts and is especially tractable given the number of cells. PopAlign (Chen et al, 2020a(Chen et al, , 2020b fits a Gaussian and matches perturbed and unperturbed cell populations after factor decomposition into a latent space (with orthogonal non-negative matrix factorization, such that PopAlign is also in part a factor decomposition model). PhEMD The ''data'' column describes the data modality the model is based on as described in the original paper.…”
Section: Nonlinear Distribution Modelingmentioning
“…We provide a review of existing single-cell omics perturbation models separated into categories based on commonly established ways to categorize ML models (Table 1). All methods except CellOracle (Kamimoto et al, 2020) can be used for perturbations as defined by a before-and-after effect, which could include healthy versus diseased phenotypes (Buschur et al, 2020), or cross-species translation (Lotfollahi et al, 2019;Chen et al, 2020aChen et al, , 2020b. CellOracle is specific to genetic/single-target perturbations as it infers effect through propagating signal through a gene regulation network (GRN).…”
Section: Current Approaches For Perturbation Modeling In Single-cell Omicsmentioning
confidence: 99%
“…While also applicable to bulk transcriptomics (Umarov and Arner, 2020;Rampá sek et al, 2019), distribution modeling gained popularity in the single-cell field as a way to describe population shifts and is especially tractable given the number of cells. PopAlign (Chen et al, 2020a(Chen et al, , 2020b fits a Gaussian and matches perturbed and unperturbed cell populations after factor decomposition into a latent space (with orthogonal non-negative matrix factorization, such that PopAlign is also in part a factor decomposition model). PhEMD The ''data'' column describes the data modality the model is based on as described in the original paper.…”
Section: Nonlinear Distribution Modelingmentioning
“…We test ASFS method on three biology datasets: a dataset of peripheral blood mononuclear cells (PBMCs) [13], the Tabula Muris dataset [14], and MM datasets [15]. We test with both mincell strategy and min-complexity strategy introduced in Cell Selection part.…”
Section: Resultsmentioning
confidence: 99%
“…Having demonstrated the method on identifying cell-type specific markers at both small-and largescale, we next turned to applying ASFS to discovering disease-specific markers. We used singlecell data from peripheral blood immune cells collected from both healthy donors and patients who have been diagnosed with multiple myeloma [15]. Multiple myeloma (MM) is an incurable cancer of plasma cells, known as myeloma cells, that overproliferate in the bone marrow.…”
Section: Minimal Gene Sets For Classification Of Disease State In Peripheral Blood Cells From Multiple Myeloma Patient Samplesmentioning
confidence: 99%
“…We apply our active feature selection method to three test cases: a dataset of peripheral blood mononuclear cells (PBMCs) [13], the Tabula Muris mouse tissue survey [14], and a data set of multiple myeloma patient PBMCs [15]. We also systematically compare the performance of the method to six existing conventional feature selection methods, showing that our method outperforms other methods in terms of classification accuracy.…”
Sequencing costs currently prohibit the application of single cell mRNA-seq for many biological and clinical tasks of interest. Here, we introduce an active learning framework that constructs compressed gene sets that enable high accuracy classification of cell-types and physiological states while analyzing a minimal number of gene transcripts. Our active feature selection procedure constructs gene sets through an iterative cell-type classification task where misclassified cells are examined at each round to identify maximally informative genes through an `active' support vector machine (SVM) classifier. Our active SVM procedure automatically identifies gene sets that enables >90% cell-type classification accuracy in the Tabula Muris mouse tissue survey as well as a ~40 gene set that enables classification of multiple myeloma patient samples with >95% accuracy. Broadly, the discovery of compact but highly informative gene sets might enable drastic reductions in sequencing requirements for applications of single-cell mRNA-seq.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.