There exist numerous programs and packages that perform validation for a given clustering solution; however, clustering algorithms fare differently as judged by different validation measures. If more than one performance measure is used to evaluate multiple clustering partitions, an optimal result is often difficult to determine by visual inspection alone. This paper introduces optCluster, an R package that uses a single function to simultaneously compare numerous clustering partitions (created by different algorithms and/or numbers of clusters) and obtain a “best” option for a given dataset. The method of weighted rank aggregation is utilized by this package to objectively aggregate various performance measure scores, thereby taking away the guesswork that often follows a visual inspection of cluster results. The optCluster package contains biological validation measures as well as clustering algorithms developed specifically for RNA sequencing data, making it a useful tool for clustering genomic data.Availability:This package is available for free through the Comprehensive R Archive Network (CRAN) at http://cran.rproject.org/web/packages/optCluster/
Sekula, Michael N., "OptCluster : an R package for determining the optimal clustering algorithm and optimal number of clusters." (2015). Electronic Theses and Dissertations. Paper 2147.
Single-cell RNA sequencing (scRNA-seq) technologies are revolutionary tools allowing researchers to examine gene expression at the level of a single cell.Traditionally, transcriptomic data have been analyzed from bulk samples, masking the heterogeneity now seen across individual cells. Even within the same cellular population, genes can be highly expressed in some cells but not expressed (or lowly expressed) in others. Therefore, the computational approaches used to analyze bulk RNA sequencing data are not appropriate for the analysis of scRNA-seq data. Here, we present a novel statistical model for high dimensional and zero-inflated scRNA-seq count data to identify differentially expressed (DE) genes across cell types. Correlated random effects are employed based on an initial clustering of cells to capture the cell-to-cell variability within treatment groups. Moreover, this model is flexible and can be easily adapted to an independent random effect structure if needed. We apply our proposed methodology to both simulated and real data and compare results to other popular methods designed for detecting DE genes. Due to the hurdle model's ability to detect differences in the proportion of cells expressed and the average expression level (among the expressed cells), our methods naturally identify some genes as DE that other methods do not, and we demonstrate with real data that these uniquely detected genes are associated with similar biological processes and functions.
Background: Gene co-expression networks (GCNs) are powerful tools that enable biologists to examine associations between genes during different biological processes. With the advancement of new technologies, such as single-cell RNA sequencing (scRNA-seq), there is a need for developing novel network methods appropriate for new types of data. Results: We present a novel sparse Bayesian factor model to explore the network structure associated with genes in scRNA-seq data. Latent factors impact the gene expression values for each cell and provide flexibility to account for common features of scRNA-seq: high proportions of zero values, increased cell-to-cell variability, and overdispersion due to abnormally large expression counts. From our model, we construct a GCN by analyzing the positive and negative associations of the factors that are shared between each pair of genes. Conclusions: Simulation studies demonstrate that our methodology has high power in identifying gene-gene associations while maintaining a nominal false discovery rate. In real data analyses, our model identifies more known and predicted protein-protein interactions than other competing network models.
has been reviewed by the Editorial Board and by special expert referees. Although it is judged not acceptable for publication in Obstetrics & Gynecology in its present form, we would be willing to give further consideration to a revised version. If you wish to consider revising your manuscript, you will first need to study carefully the enclosed reports submitted by the referees and editors. Each point raised requires a response, by either revising your manuscript or making a clear and convincing argument as to why no revision is needed. To facilitate our review, we prefer that the cover letter include the comments made by the reviewers and the editor followed by your response. The revised manuscript should indicate the position of all changes made. We suggest that you use the "track changes" feature in your word processing software to do so (rather than strikethrough or underline formatting).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.