On Potts Model Clustering, Kernel<i>K</i>-Means and Density Estimation

Murua, Alejandro; Stanberry, Larissa; Stuetzle, Werner

doi:10.1198/106186008x318855

Cited by 20 publications

(28 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Potts clustering approach, also known as super-paramagnetic clustering, is based on the physical behavior of an inhomogeneous ferromagnet 37. No assumptions are made about the underlying distribution of the data.…”

Section: Methodsmentioning

confidence: 99%

EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data

Linard

Nguyen

Prosdocimi

et al. 2011

Evol Bioinform Online

View full text Add to dashboard Cite

Evolutionary systems biology aims to uncover the general trends and principles governing the evolution of biological networks. An essential part of this process is the reconstruction and analysis of the evolutionary histories of these complex, dynamic networks. Unfortunately, the methodologies for representing and exploiting such complex evolutionary histories in large scale studies are currently limited. Here, we propose a new formalism, called EvoluCode (Evolutionary barCode), which allows the integration of different evolutionary parameters (eg, sequence conservation, orthology, synteny …) in a unifying format and facilitates the multilevel analysis and visualization of complex evolutionary histories at the genome scale. The advantages of the approach are demonstrated by constructing barcodes representing the evolution of the complete human proteome. Two large-scale studies are then described: (i) the mapping and visualization of the barcodes on the human chromosomes and (ii) automatic clustering of the barcodes to highlight protein subsets sharing similar evolutionary histories and their functional analysis. The methodologies developed here open the way to the efficient application of other data mining and knowledge extraction techniques in evolutionary systems biology studies. A database containing all EvoluCode data is available at: http://lbgi.igbmc.fr/barcodes.

show abstract

Section: Methodsmentioning

confidence: 99%

EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data

Linard

Nguyen

Prosdocimi

et al. 2011

Evol Bioinform Online

View full text Add to dashboard Cite

show abstract

“…MAP + PMC and iPrior + PMC stand for the procedures with clustering evidence drawn from the MAP (σ M , T M ) and the datadriven prior maximizer (σ p , T p these scores, the reader may get an idea of how difficult it is to cluster some datasets into the groups selected by some experts (see, e.g., the Yeast cycle data below). The artificial datasets were (a) a 5-clump-3-arc dataset (Murua, Stanberry, and Stuetzle 2008) whose clusters present high variation in shape and distribution and are not very well separated; (b) a three-ring version of the Bull's eye data (Blatt, Domany, and Wiseman 1997), which are a real challenge for most clustering methods; and (c) a 50-Gaussian mixture dataset whose differences in cluster volume may produce difficulties when choosing the appropriate temperature-bandwidth parameters. The data are plotted in Figure 3.…”

Section: Performance On Real and Simulated Datamentioning

confidence: 99%

“…Its impact has reached the medical (Stanberry, Murua, and Cordes 2008), bioinformatics (Getz et al 2000;Einav et al 2005), and the computer science and machine learning communities as well (Domany et al 1999;Quiroga, Nadasdy, and Ben-Shaul 2004). It also has been mentioned in the statistical literature, but as Potts model clustering (Murua, Stanberry, and Stuetzle 2008), where its link with other kernel-based methods and nonparametric density estimation was presented. A similar, simpler model has also been used as a probabilistic framework for K-nearest-neighbor classification (Cucala et al 2009).…”

Section: Introductionmentioning

confidence: 99%

“…It is usually set to the mean of the data point similarities Wiseman 1996, 1997). However, Murua, Stanberry, and Stuetzle (2008) showed that treating it as variable may give better clustering results. Although they suggest the use of a data-driven adaptive bandwidth, we believe that its incorporation as a parameter of the model is more appropriate.…”

Section: Introductionmentioning

confidence: 99%

“…This procedure is heavily based on a data-driven estimate of a very informative prior, which is derived from random graph theory and the connection between kernel-based methods and kernel density estimation (Murua, Stanberry, and Stuetzle 2008). We refer to this latter procedure as the informed conditional-Potts clustering.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

The Conditional-Potts Clustering Model

Murua¹,

Wicker²

2014

Journal of Computational and Graphical Statistics

Self Cite

View full text Add to dashboard Cite

This article presents a Bayesian kernel-based clustering method. The associated model arises as an embedding of the Potts density for class membership probabilities into an extended Bayesian model for joint data and class membership probabilities. The method may be seen as a principled extension of the super-paramagnetic clustering. The model depends on two parameters: the temperature and the kernel bandwidth. The clustering is obtained from the posterior marginal adjacency membership probabilities and does not depend on any particular value of the parameters. We elicit an informative prior based on random graph theory and kernel density estimation. A stochastic population Monte Carlo algorithm, based on parallel runs of the Wang-Landau algorithm, is developed to estimate the posterior adjacency membership probabilities and the parameter posterior. The convergence of the algorithm is also established. The method is applied to the whole human proteome to uncover human genes that share common evolutionary history. Our experiments and application show that good clustering results are obtained at many different values of the temperature and bandwidth parameters. Hence, instead of focusing on finding adequate values of the parameters, we advocate making clustering inference based on the study of the distribution of the posterior adjacency membership probabilities. This article has online supplementary material.

show abstract

Building cancer prognosis systems with survival function clusters

Muñóz

Murua

2018

Statistical Analysis

Self Cite

View full text Add to dashboard Cite

In oncology, risk groups are usually constructed by dividing the population in blocks of patients with similar health conditions and demographics levels. Even for a handful of factors, the number of risk groups may be large, which complicates the analyses. There is a need to cluster together homogeneous blocks of patients into larger entities with similar survival characteristics. We develop and compare several techniques to detect these patient meta-blocks. Our prognosis systems are based on the integrated absolute distance between the survival functions associated with patient blocks. We propose the use of vectorization of survival curves and of principled ensemble algorithms for clustering. We test these methods on different complexity scenarios. The best performing methods are then used to create prognosis systems for the NCIC lung cancer database, a longitudinal study of the U.S. lung cancer patients who were followed from 1988 to 2009.

show abstract

On Potts Model Clustering, KernelK-Means and Density Estimation

Cited by 20 publications

References 54 publications

EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data

EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data

The Conditional-Potts Clustering Model

Building cancer prognosis systems with survival function clusters

Contact Info

Product

Resources

About