Classification of Binary Vectors by Stochastic Complexity

Gyllenberg, Mats; Koski, Timo; Verlaan, Martin

doi:10.1006/jmva.1997.1687

Cited by 31 publications

(18 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Binary data clustering has been widely studied in literature [25,29,33,42]. A unified view of binary data clustering has been provided by examining the connections among various methods including entropy-based methods, distance-based methods (e.g., K-means), mixture models, and matrix decomposition [38,39].…”

Section: Methodsmentioning

confidence: 99%

On combining multiple clusterings: an overview and a new perspective

Ogihara

2009

Appl Intell

View full text Add to dashboard Cite

Many problems can be reduced to the problem of combining multiple clusterings. In this paper, we first summarize different application scenarios of combining multiple clusterings and provide a new perspective of viewing the problem as a categorical clustering problem. We then show the connections between various consensus and clustering criteria and discuss the complexity results of the problem. Finally we propose a new method to determine the final clustering. Experiments on kinship terms and clustering popular music from heterogeneous feature sets show the effectiveness of combining multiple clusterings.

show abstract

Section: Methodsmentioning

confidence: 99%

On combining multiple clusterings: an overview and a new perspective

Ogihara

2009

Appl Intell

View full text Add to dashboard Cite

show abstract

“…The first term in equation (3) describes the complexity of the classification and the second term the complexity of the strains with respect to the classification. Gyllenberg et al (1994b) also showed that minimizing the SC with respect to the model (2) amounts to maximizing the information content of the classification.…”

Section: Description Of Classesmentioning

confidence: 99%

“…Gyllenberg et al (1994b) showed that minimizing SC amounts to maximizing the information content of the classification. Thus increasing SC implies loss of information whereas decreasing SC indicates gain in information content.…”

Section: A Good Classification Should Have An Informationmentioning

confidence: 99%

“…It was shown by Gyllenberg et al (1994b) that the stochastic complexity SC of a set of t strains with respect to the above model is where ti is the number of strains in class j and tij is the number of strains in class j with the it" feature equal to 1 (log denotes the logarithm to the base 2). The first term in equation (3) describes the complexity of the classification and the second term the complexity of the strains with respect to the classification.…”

Section: Description Of Classesmentioning

confidence: 99%

“…Gyllenberg et al (1994b) The procedure is repeated starting from step 2 using the HMOs found in step 4 at the previous iteration until the HMOs do not change.…”

Section: Description Of Classesmentioning

confidence: 99%

See 2 more Smart Citations

Classification of Enterobacteriaceae by minimization of stochastic complexity

Gyllenberg

Koski

et al. 1997

Microbiology

View full text Add to dashboard Cite

A new method for classifying bacteria is presented and applied to a large set of biochemical data for the Enterobacteriaceae. The method minimizes the bits needed to encode the classes and the items or, equivalently, maximizes the information content of the classification. The resulting taxonomy of Enterobacteriaceae corresponds well to the general structure of earlier classifications. Minimization of stochastic complexity can be considered as a useful tool to create bacterial classifications that are optimal from the point of view of information theory.

show abstract

Bayesian Predictive Identification and Cumulative Classification of Bacteria

Gyllenberg

1999

Bulletin of Mathematical Biology

View full text Add to dashboard Cite

In this paper we give a mathematically precise formulation of an old idea in bacterial taxonomy, namely cumulative classification, where the taxonomy is continuously updated and possibly augmented as new strains are identified. Our formulation is based on Bayesian predictive probability distributions. The criterion for founding a new taxon is given a firm theoretical foundation based on prediction and it is given a clear-cut interpretation. We formulate an algorithm for cumulative classification and apply it to a large database of bacteria belonging to the family Enterobacteriaceae. The resulting taxonomy makes microbiological sense.

show abstract

Classification of Binary Vectors by Stochastic Complexity

Cited by 31 publications

References 30 publications

On combining multiple clusterings: an overview and a new perspective

On combining multiple clusterings: an overview and a new perspective

Classification of Enterobacteriaceae by minimization of stochastic complexity

Bayesian Predictive Identification and Cumulative Classification of Bacteria

Contact Info

Product

Resources

About