Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results

Zhao, Shitao; Sun, Jianqiang; Shimizu, Kentaro; Kadota, Koji

doi:10.1186/s12575-018-0067-8

Cited by 34 publications

(37 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In a comparison of the performance of each method using different numbers of replicates ( Fig. 1 and Additional le 1), we observed that AUC values tend to increase as the number of replicates increases and this trend is consistent with a previous report [20].…”

Section: Analysis Of Simulated Data For a Two-group Comparisonsupporting

confidence: 91%

“…To date, several methods to enable the analysis of RNA-seq data have been developed, including normalization [5][6][7][8][9][10], various R packages [11][12][13][14][15][16], and graphical user interfaces (GUI) [17][18][19]. Research on more e cient and accurate methods to identify DEGs continues, and new ndings continue to be reported by researchers [20][21][22][23][24][25].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Differential Expression Analysis Using A Model-Based Gene Clustering Algorithm for RNA-Seq Data

Osabe

Shimizu

Kadota

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Background RNA-seq is a tool for measuring gene expression and is commonly used to identify differentially expressed genes (DEGs). Gene clustering is used to classify DEGs with similar expression patterns for the subsequent analyses of data from experiments such as time-courses or multi-group comparisons. However, gene clustering has rarely been used for analyzing simple two-group data or differential expression (DE). In this study, we report a model-based clustering algorithm, MBCluster.Seq, that can be implemented using an R package for DE analysis.Results The input data originally used by MBCluster.Seq is DEGs, and the proposed method (called MBCdeg) uses all genes for the analysis. The method uses posterior probabilities of genes assigned to a cluster displaying non-DEG pattern for overall gene ranking. We compared the performance of MBCdeg with conventional R packages such as edgeR, DESeq2, and TCC that are specialized for DE analysis using simulated and real data. Our results showed that MBCdeg outperformed other methods when the proportion of DEG was less than 50%. However, the DEG identification using MBCdeg was less consistent than with conventional methods. We compared the effects of different normalization algorithms using MBCdeg, and performed an analysis using MBCdeg in combination with a robust normalization algorithm (called DEGES) that was not implemented in MBCluster.Seq. The new analysis method showed greater stability than using the original MBCdeg with the default normalization algorithm.Conclusions MBCdeg with DEGES normalization can be used in the identification of DEGs when the PDEG is relatively low. As the method is based on gene clustering, the DE result includes information on which expression pattern the gene belongs to. The new method may be useful for the analysis of time-course and multi-group data, where the classification of expression patterns is often required.

show abstract

Section: Analysis Of Simulated Data For a Two-group Comparisonsupporting

confidence: 91%

Section: Introductionmentioning

confidence: 99%

Differential Expression Analysis Using A Model-Based Gene Clustering Algorithm for RNA-Seq Data

Osabe

Shimizu

Kadota

2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The AgglomerativeClustering function from Scikit-Learn with euclidean affinity and Ward linkage was used to perform hierarchical grouping (Pedregosa et al, 2011 ). We calculated the number of groups (synaptic subtypes) that best describe our data based on maximization of the silhouette score, a measure of similarity within a group and dissimilarity between different groups (Rousseeuw, 1987 ; Zhao et al, 2018 ).…”

Section: Methodsmentioning

confidence: 99%

Activity-Dependent Remodeling of Synaptic Protein Organization Revealed by High Throughput Analysis of STED Nanoscopy Images

Wiesner¹,

Bilodeau²,

Bernatchez³

et al. 2020

Front. Neural Circuits

View full text Add to dashboard Cite

“…However, even the tendency to obtain a large number of DE genes between cell types cannot distinguish these. For example, a bulk RNA-seq dataset exists (Schurch et al, 2016 ) that can produce nearly 70% DE genes (Zhao et al, 2018 ). A common feature of these data sets is a high number of replicates (>40 replicates per group).…”

mentioning

confidence: 99%

Commentary: A Systematic Evaluation of Single Cell RNA-Seq Analysis Pipelines

Kadota

Shimizu

2020

Front. Genet.

Self Cite

View full text Add to dashboard Cite

Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results

Cited by 34 publications

References 60 publications

Differential Expression Analysis Using A Model-Based Gene Clustering Algorithm for RNA-Seq Data

Differential Expression Analysis Using A Model-Based Gene Clustering Algorithm for RNA-Seq Data

Activity-Dependent Remodeling of Synaptic Protein Organization Revealed by High Throughput Analysis of STED Nanoscopy Images

Commentary: A Systematic Evaluation of Single Cell RNA-Seq Analysis Pipelines

Contact Info

Product

Resources

About