Single-cell RNA sequencing (scRNA-seq) has transformed biomedical research, enabling decomposition of complex tissues into disaggregated, functionally distinct cell types. For many applications, investigators wish to identify cell types with known marker genes. Typically, such cell type assignments are performed through unsupervised clustering followed by manual annotation based on these marker genes, or via "mapping" procedures to existing data. However, the manual interpretation required in the former case scales poorly to large datasets, which are also often prone to batch effects, while existing data for purified cell types must be available for the latter. Furthermore, unsupervised clustering can be error-prone, leading to under-and overclustering of the cell types of interest. To overcome these issues we present CellAssign, a probabilistic model that leverages prior knowledge of cell type marker genes to annotate scRNA-seq data into pre-defined and de novo cell types. CellAssign automates the process of assigning cells in a highly scalable manner across large datasets while simultaneously controlling for batch and patient effects. We demonstrate the analytical advantages of CellAssign through extensive simulations and exemplify realworld utility to profile the spatial dynamics of high-grade serous ovarian cancer and the temporal dynamics of follicular lymphoma. Our analysis reveals subclonal malignant phenotypes and points towards an evolutionary interplay between immune and cancer cell populations with cancer cells escaping immune recognition.
Gene co-expression networks yield critical insights into biological processes, and single-cell RNA sequencing provides an opportunity to target inquiries at the cellular level. However, due to the sparsity and heterogeneity of transcript counts, it is challenging to construct accurate gene networks. We develop an approach that estimates cell-specific networks (CSN) for each cell using a method inspired by Dai et al. Although individual CSNs are estimated with considerable noise, average CSNs provide stable estimates of network structure, which provide better estimates of gene block structures than traditional measures. The method, called locCSN, is based on a non-parametric investigation of the joint distribution of gene expression, hence it can readily detect nonlinear correlations, and it is more robust to distributional challenges. The individual networks preserve information about the heterogeneity of the cells and having repeated estimates of network structure facilitates testing for difference in network structure between groups of cells. The original CSN algorithm showed promise; however, it had shortcomings which locCSN overcomes. Additionally, we propose new downstream analysis methods using CSNs, to utilize more fully the information contained within them. Finally, to further our understanding of autism spectrum disorder we examined the evolution of gene networks in fetal brain cells and compared the CSNs of cells sampled from case and control subjects to reveal intriguing patterns in gene co-expression changes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.