How many clusters?

McCullagh, Peter; Yang, Jie

doi:10.1214/08-ba304

Cited by 68 publications

(58 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…See [1] for a relatively recent overview of this literature. Well-behaved mathematically tractable models of random partitions are of interest to probabilists as well as statisticians and scientists; see [10], [12], [13], and [15]. Ewens [10] first introduced the Ewens' sampling formula in the context of theoretical population biology.…”

Section: Introductionmentioning

confidence: 99%

A Consistent Markov Partition Process Generated from the Paintbox Process

Crane

2011

J. Appl. Probab.

View full text Add to dashboard Cite

We study a family of Markov processes on P (k) , the space of partitions of the natural numbers with at most k blocks. The process can be constructed from a Poisson point process onν , where ν is the distribution of the paintbox based on the probability measure ν on P m , the set of ranked-mass partitions of 1, and (k) ν is the product measure on. We show that these processes possess a unique stationary measure, and we discuss a particular set of reversible processes for which transition probabilities can be written down explicitly.

show abstract

Section: Introductionmentioning

confidence: 99%

A Consistent Markov Partition Process Generated from the Paintbox Process

Crane

2011

J. Appl. Probab.

View full text Add to dashboard Cite

show abstract

“…We used 6 servers as slave machines for both of the proposed framework and Hadoop: 4 servers with 4-core 2.8 GHz CPU and 4 GB memory, and 2 servers with two of 4-core 2.53 GHz CPU and 2 GB memory. In Table 4 shows execution times of one iteration on three machine learning algorithms: K-Means [2], Dirichlet process clustering [12] and IPM perceptron [13,14]. The values are mean and standard deviation over 10 runs.…”

Section: Discussionmentioning

confidence: 99%

“…Table 4. Comparison of the parallel machine learning framework and Mahout on K-Means [2], Dirichlet process clustering [12] and IPM perceptron [13,14]. We also applied the framework in order to parallelize a learning algorithm of an acoustic model for speech recognition.…”

Section: Discussionmentioning

confidence: 99%

Analysis and Learning Frameworks for Large-Scale Data Mining

Yanai¹,

Yanase²

2012

Advances in Data Mining Knowledge Discovery and Applications

View full text Add to dashboard Cite

“…Somewhat unfortunately, this heuristic method tends to fail in actual applications when the number of dimensions increases. More formal inspiration for clusterability analysis is provided by considerations on the number of clusters (main references include [3][4][5][6]) and component overlap analysis for mixtures of normal distributions (see [7][8][9][10]). However, both approaches originally assume the underlying partition to be already determined.…”

Section: State-of-the-artmentioning

confidence: 99%