Identifying Cell Subpopulations and Their Genetic Drivers from Single-Cell RNA-Seq Data Using a Biclustering Approach

Shi, Funan; Huang, Haiyan

doi:10.1089/cmb.2017.0049

Cited by 13 publications

(6 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[40] Recent tools such as DivBiclust, PanoView, and scziDesk have been developed in order to make such analysis possible on larger datasets through biclustering or iterative clustering that can scale with dataset size reducing the need for dimensionality reduction. [41][42][43][44] An even newer unsupervised clustering approach called scGAC also seeks to analyze high dimensional and sparse datasets. The method utilizes latent relationship information across cells to graphically obtain cell clusters.…”

Section: Dimensionality Reductionmentioning

confidence: 99%

Bioinformatic Methods for Identifying Differentially Abundant Subpopulations in scRNA-seq Data

Kleinberg¹,

Shinde²,

Batchu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Single-cell RNA sequencing data facilitates investigation of cell heterogeneity and subpopulations as well as differentially abundant states however modern single-cell RNA sequencing datasets are growing in size and complexity requiring advances in the bioinformatic methods that analyze them. Many methods exist for each step of analysis including read alignment, normalization, quality control, batch effect correction, imputation and dimensionality reduction. With so many options to choose from at each step of the analysis, benchmarking and a synthesis of the literature on the methods available is necessary to inform biological researchers on the most optimal workflow for their data. Here, recent key methods of analysis are highlighted with a focus on methods that facilitate identification of cell subpopulations and differentially abundant cell states. With a constantly expanding toolset for each step in single-cell RNA sequencing dataset analysis, biological researchers should stay informed to utilize the most applicable methods for their own analyses.

show abstract

Section: Dimensionality Reductionmentioning

confidence: 99%

Bioinformatic Methods for Identifying Differentially Abundant Subpopulations in scRNA-seq Data

Kleinberg¹,

Shinde²,

Batchu³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…While it is useful for de novo discovery of new cell types and subtypes, unsupervised learning depends on many user-specific inputs, including which clustering algorithm to use (e.g., K-means clustering, hierarchical clustering, density-based clustering or graph-based clustering), the type of similarity or distance metric between two cells, and the number of clusters, which is a key parameter needed for many clustering algorithms. Taking into account the distinct features of scRNA-seq data, multiple cell clustering algorithms have been developed, including SNN-Cliq, which does not use conventional similarity measures but leverages the ranking of cells to construct a cell-cell graph for identifying cell clusters [244]; BiSNN-Walk, which extends SNN-Cliq and uses an iterative biclustering approach to return a ranked list of cell clusters, each associated with a set of ranked genes based on their levels of affiliation with the cluster [192]; CIDR, the first clustering method that incorporates imputation of dropout gene expression levels [125]; SC3, a widelyused ensemble method that combines multiple clustering algorithms [106]; and Seurat, which identifies cell clusters based on a shared nearest neighbor (SNN) clustering algorithm [184]. In addition to commonly used similarity metric including the Pearson correlation, Spearman correlation, Euclidean distance, other cell similarity measures can be found in, for example, [91,186].…”

Section: Identification Of Cell Typesmentioning

confidence: 99%

Network Modeling in Biology: Statistical Methods for Gene and Brain Networks

Wang

et al. 2021

Statist. Sci.

Self Cite

View full text Add to dashboard Cite

The rise of network data in many different domains has offered researchers new insights into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using measured data as a first step. We provide a discussion on existing statistical and computational methods for edge estimation and subsequent statistical inference problems in these two types of biological networks.

show abstract

“…The recent advent of scRNA-seq technology has enabled researchers to study heterogeneity between individual cells and define cell type a based solely on its transcriptome [132]. Using biclustering, researchers can not only group cells into subpopulations but also identify biologically important gene signatures for each class simultaneously [95,139]. For example, Zeisel et al [95] recently classified single cells from the brain through biclustering, which identified numerous marker genes and highly restricted expression patterns of transcription factors for cell types.…”

Section: Biomarker and Gene Signatures Detectionmentioning

confidence: 99%

It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data

Xie

Fennell

et al. 2018

Briefings in Bioinformatics

View full text Add to dashboard Cite

Biclustering is a powerful data mining technique that allows clustering of rows and columns, simultaneously, in a matrix-format data set. It was first applied to gene expression data in 2000, aiming to identify co-expressed genes under a subset of all the conditions/samples. During the past 17 years, tens of biclustering algorithms and tools have been developed to enhance the ability to make sense out of large data sets generated in the wake of high-throughput omics technologies. These algorithms and tools have been applied to a wide variety of data types, including but not limited to, genomes, transcriptomes, exomes, epigenomes, phenomes and pharmacogenomes. However, there is still a considerable gap between biclustering methodology development and comprehensive data interpretation, mainly because of the lack of knowledge for the selection of appropriate biclustering tools and further supporting computational techniques in specific studies. Here, we first deliver a brief introduction to the existing biclustering algorithms and tools in public domain, and then systematically summarize the basic applications of biclustering for biological data and more advanced applications of biclustering for biomedical data. This review will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency.

show abstract

Identifying Cell Subpopulations and Their Genetic Drivers from Single-Cell RNA-Seq Data Using a Biclustering Approach

Cited by 13 publications

References 21 publications

Bioinformatic Methods for Identifying Differentially Abundant Subpopulations in scRNA-seq Data

Bioinformatic Methods for Identifying Differentially Abundant Subpopulations in scRNA-seq Data

Network Modeling in Biology: Statistical Methods for Gene and Brain Networks

It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data

Contact Info

Product

Resources

About