Marina Meilă scite author profile

This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set. The criterion, called variation of information (VI), measures the amount of information lost and gained in changing from clustering C to clustering C . The basic properties of VI are presented and discussed. We focus on two kinds of properties: (1) those that help one build intuition about the new criterion (in particular, it is shown the VI is a true metric on the space of clusterings), and (2) those that pertain to the comparability of VI values over different experimental conditions. As the latter properties have rarely been discussed explicitly before, other existing comparison criteria are also examined in their light. Finally we present the VI from an axiomatic point of view, showing that it is the only "sensible" criterion for comparing partitions that is both aligned to the lattice and convexely additive. As a consequence, we prove an impossibility result for comparing partitions: there is no criterion for comparing partitions that simultaneously satisfies the above two desirable properties and is bounded.

show abstract

Comparing Clusterings by the Variation of Information

Meilă

2003

464

355

View full text Add to dashboard Cite

Tractable Bayesian learning of tree belief networks

2006

View full text Add to dashboard Cite

In this paper we present decomposable priors, a family of priors over structure and parameters of tree belief nets for which Bayesian learning with complete observations is tractable, in the sense that the posterior is also decomposable and can be completely determined ana lytically in polynomial time. This fol lows from two main results: First, we show that factored distributions over spanning trees in a graph can be inte grated in closed form. Second, we ex amine priors over tree parameters and show that a set of assumptions similar to (Heckerman and al., 1995) constrain the tree parameter priors to be a com pactly parametrized product of Dirich let distributions. Besides allowing for exact Bayesian learning, these results permit us to formulate a new class of tractable latent variable models in which the likelihood of a data point is com puted through an ensemble average over tree structures. IntroductionIn the framework of graphical models, tree dis tributions stand out by their special computa tional advantages. Inference and sampling from a tree are linear in the number of variables n. While it is known that for many classes of graphical models, as for example junction treesTommi Jaakkola Massachusetts Institute of Technology tommi@ai.mit.edu with cliquewidth > 2, the problem of learn ing the optimal structure is NP-hard, for trees this problem is solvable in only quadratic time. The latter result is due to [Chow and Liu, 1968) who present an algorithm for finding the struc ture and parameters of the tree that best fits a given distribution in the Maximum Likelihood (ML) framework. This algorithm was general ized to Maximum A-Posteriori (MAP) learning [MeiUi-Predoviciu, 1999, Heckerman et al., 1995.In this paper we present another remarkable prop erty of tree graphical models: the fact that Bayesian learning for a certain class of priors, called decomposable 1 priors, is also tractable. Es sentially, decomposable priors are priors that can be represented as a product of factors correspond ing to the edges of the tree. We show that if the prior is decomposable and we have a data set consisting of N complete i.i.d. observations, then the posterior distribution over all tree struc tures and parameters is also decomposable, is expressible with a quadratic number of param eters that can be computed exactly from data in O(n3 + n2 N) operations. Evaluating the poste rior for a given tree takes then 0( n) time. The first two results come from the fact that, with the standard assumptions of likelihood equivalence, parameter independence and parameter modular ity, the prior for tree parameters is constrained to be a product of Dirichlet distributions whose parameters satisfy a set of consistency relations. The last result, i.e. the possibility of computing the posterior exactly, is a consequence of the fact 1The term decomposable prior will refer here to a prior over a family of graphical models. It should not be confused with a decomposable model which is a distribution over V.

show abstract

Comparing clusterings

Meilă

2005

448

117

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marina Meilă

Comparing clusterings—an information based distance

Comparing Clusterings by the Variation of Information

Tractable Bayesian learning of tree belief networks

Comparing clusterings

Contact Info

Product

Resources

About