2005
DOI: 10.1109/tit.2005.856956
|View full text |Cite
|
Sign up to set email alerts
|

Asymptotics of Discrete MDL for Online Prediction

Abstract: Abstract-Minimum description length (MDL) is an important principle for induction and prediction, with strong relations to optimal Bayesian learning. This paper deals with learning processes which are independent and identically distributed (i.i.d.) by means of two-part MDL, where the underlying model class is countable. We consider the online learning framework, i.e., observations come in one by one, and the predictor is allowed to update its state of mind after each time step. We identify two ways of predict… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2006
2006
2021
2021

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(14 citation statements)
references
References 33 publications
(86 reference statements)
0
14
0
Order By: Relevance
“…Now, if the AIC–BIC dilemma is interpreted as a conflict between consistency and optimal sequential prediction, then cumulative risk is a natural and often‐considered performance criterion (Haussler and Opper, 1997; Rissanen et al. , 1992; Barron, 1998a; Yang and Barron, 1999; Poland and Hutter, 2005), and we can reasonably claim that our results solve the dilemma. However, it can also be interpreted as a dichotomy between model selection for truth finding and model selection‐based (non‐sequential) estimation.…”
Section: Discussionmentioning
confidence: 63%
See 1 more Smart Citation
“…Now, if the AIC–BIC dilemma is interpreted as a conflict between consistency and optimal sequential prediction, then cumulative risk is a natural and often‐considered performance criterion (Haussler and Opper, 1997; Rissanen et al. , 1992; Barron, 1998a; Yang and Barron, 1999; Poland and Hutter, 2005), and we can reasonably claim that our results solve the dilemma. However, it can also be interpreted as a dichotomy between model selection for truth finding and model selection‐based (non‐sequential) estimation.…”
Section: Discussionmentioning
confidence: 63%
“…Minimax cumulative risk has previously been studied by, among others, Haussler and Opper (1997), Rissanen et al. (1992), Barron (1998a), Yang and Barron (1999) and Poland and Hutter (2005).…”
Section: Risk Bounds: Preliminaries and Parametric Casementioning
confidence: 99%
“…Several estimation procedures do not only provide q n on X n , but measures on X ∞ or equivalently for each n separately a TC q n : X * → [0;1] (see Bayes and crude MDL below). While this opens further options forq, e.g.q(x n+1 |x 1:n ):=q n (x 1:n+1 )/q n (x 1:n ) with some (weak) results for MDL [PH05], it does not solve our main problem.…”
Section: Conversion Methodsmentioning
confidence: 96%
“…Crude MDL simply selects q n := argmax ν∈M {ν(x 1:n ) w(ν)} at time n, which is a probability measure on X ∞ . While this opens additional options for defining q, they also can perform poorly in the worst case [PH05]. Note that most versions of MDL perform often very well in practice, comparable to Bayes; robustness and proving guarantees are the open problems.…”
Section: Examplesmentioning
confidence: 99%
“…where 13 follows from Equations 7 and 8. Our ρ i , ρ norm i , and ρ stat i are closely inspired by Poland and Hutter (2005), who constructed (in our notation) ρ 1 , ρ norm 1 , and ρ stat 1 . Our first lemma bounds the deviation from ρ i being a measure.…”
Section: General Sequence Predictionmentioning
confidence: 99%