Graphical models such as Markov random fields (MRFs) that are associated with undirected graphs, and Bayesian networks (BNs) that are associated with directed acyclic graphs, have proven to be a very popular approach for reasoning under uncertainty, prediction problems and causal inference.Parametric MRF likelihoods are well-studied for Gaussian and categorical data. However, in more complicated parametric and semi-parametric settings, likelihoods specified via clique potential functions are generally not known to be congenial or non-redundant. Congenial and non-redundant DAG likelihoods are far simpler to specify in both parametric and semi-parametric settings by modeling Markov factors in the DAG factorization. However, DAG likelihoods specified in this way are not guaranteed to coincide in distinct DAGs within the same Markov equivalence class. This complicates likelihoods based model selection procedures for DAGs by "sneaking in" potentially unwarranted assumptions about edge orientations.In this paper we link a density function decomposition due to Chen with the clique factorization of MRFs described by Lauritzen to provide a general likelihood for MRF models. The proposed likelihood is composed of variationally independent, and non-redundant closed form functionals of the observed data distribution, and is sufficiently general to apply to arbitrary parametric and semi-parametric models. We use an extension of our developments to give a general likelihood for DAG models that is guaranteed to coincide for all members of a Markov equivalence class. Our results have direct applications for model selection and semi-parametric inference.
PreliminariesWe first introduce necessary graphical modeling preliminaries. Graphs are assumed to have a vertex set V , and we will restriction attention to positive distributions. Given any graph G, for S ⊆ V , an induced subgraph G S of G is defined as the graph with a vertex set S and all edges in G connecting elements in S.Given an undirected graph (UG) G, a clique C is a (possibly empty) subset of vertices in V that are pairwise connected in G. The set of all cliques in G is denoted by C(G), while the set of all maximal cliques is denoted by C(G). Note that, in general, neither, where φ C are potential functions which map values of C to real numbers. Potential functions are not necessarily normalized probabilities. Equivalently,where Z is a normalizing constant. If we restrict attention to positive distributions, an MRF model may be equivalently defined as the set of distributions p( v) that satisfy either the global or pairwise Markov property for G. The global Markov property for p( v) and a UG G states that for any disjoint subsets A, B, C of V whenever all paths from A to B in G are intercepted by C, then A ⊥ ⊥ B| C in p( v). The pairwise Markov property for p( v) and G states that for any vertex pair A, B non-adjacent in G, A ⊥ ⊥ B| V \ {A, B} in p( v).A joint distribution p( v) is in the Bayesian network (BN) model of a directed acyclic graph (DAG), where pa G (V ) are...