2008 Information Theory and Applications Workshop 2008
DOI: 10.1109/ita.2008.4601061
|View full text |Cite
|
Sign up to set email alerts
|

Bayesian network structure learning using factorized NML universal models

Abstract: Universal codes/models can be used for data compression and model selection by the minimum description length (MDL) principle. For many interesting model classes, such as Bayesian networks, the minimax regret optimal normalized maximum likelihood (NML) universal model is computationally very demanding. We suggest a computationally feasible alternative to NML for Bayesian networks, the factorized NML universal model, where the normalization is done locally for each variable. This can be seen as an approximate s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0
1

Year Published

2011
2011
2023
2023

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 24 publications
(11 citation statements)
references
References 19 publications
0
10
0
1
Order By: Relevance
“…Note that this pNML probability assignment was essentially proposed earlier, see [7], [8], with a different motivation as one of the possible variations of the Normalized Maximum Likelihood (NML) method of [6] for universal prediction.…”
Section: Introductionmentioning
confidence: 99%
“…Note that this pNML probability assignment was essentially proposed earlier, see [7], [8], with a different motivation as one of the possible variations of the Normalized Maximum Likelihood (NML) method of [6] for universal prediction.…”
Section: Introductionmentioning
confidence: 99%
“…Even though it is strongly principled, NML computation is restricted to certain classes of models, e.g, multinomial distribution (Kontkanen & Myllymäki, 2007), naive Bayes (Mononen & Myllymäki, 2007), which prevents its use in score-based structure learning. In Bayesian networks, efficient approximations were proposed and shown to perform better in model selection (Roos et al, 2008;Silander et al, 2018).…”
Section: Nml Regret Estimationmentioning
confidence: 99%
“…where k 0 P;Q ðNÞ corresponds to a complexity term introduced in [14,15] to discriminate between variable dependence (for I 0 N ð½X� P ; ½Y� Q Þ > 0) and variable independence (for I 0 N ð½X� P ; ½Y� Q Þ⩽0) given a finite dataset of size N. In the present context of finding an optimum discretization for continuous variables, this complexity term introduces a penalty which eventually outweights the information gain in refining bin partitions further, when there is not enough data to support such a refined model, as depicted on Fig 1. For discrete variables, typical complexity terms correspond to the Bayesian Information Criterion (BIC), k BIC P;Q ðNÞ ¼ 1=2ðr x À 1Þðr y À 1Þ log N, where r x and r y are the number of bins for X and Y, or the X-and Y-Normalized Maximum Likelihood (NML) criteria [14][15][16], defined as,…”
Section: Assessing Information In Continuous or Mixed-type Datamentioning
confidence: 99%