2005
DOI: 10.1007/11503415_44
|View full text |Cite
|
Sign up to set email alerts
|

Asymptotic Log-Loss of Prequential Maximum Likelihood Codes

Abstract: We analyze the Dawid-Rissanen prequential maximum likelihood codes relative to oneparameter exponential family models M. If data are i.i.d. according to an (essentially) arbitrary P , then the redundancy grows at rate 1 2 c ln n. We show that c = σ 2 1 /σ 2 2 , where σ 2 1 is the variance of P , and σ 2 2 is the variance of the distribution M * ∈ M that is closest to P in KL divergence. This shows that prequential codes behave quite differently from other important universal codes such as the 2-part MDL, Shtar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2005
2005
2012
2012

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(21 citation statements)
references
References 24 publications
0
21
0
Order By: Relevance
“…As a consequence its error rate decreases more slowly in the sample size if we put a prior on the generating distribution that assigns nonzero probability to both models. This result was surprising to us and has led to a theoretical analysis of the codelength of the plug-in code in [Grünwald and de Rooij 2005]. It turns out that the regret of the plug-in code does not necessarily grow with k 2 ln n like the NML and Bayesian codes do, if the sample is not distributed according to any element of the model.…”
Section: Discussionmentioning
confidence: 98%
See 1 more Smart Citation
“…As a consequence its error rate decreases more slowly in the sample size if we put a prior on the generating distribution that assigns nonzero probability to both models. This result was surprising to us and has led to a theoretical analysis of the codelength of the plug-in code in [Grünwald and de Rooij 2005]. It turns out that the regret of the plug-in code does not necessarily grow with k 2 ln n like the NML and Bayesian codes do, if the sample is not distributed according to any element of the model.…”
Section: Discussionmentioning
confidence: 98%
“…We prove in [Grünwald and de Rooij 2005] that for single parameter exponential families, the regret for the plug-in code grows with 1 2 ln(n)Var P (X)/Var M (X), where n is the sample size, P is the generating distribution and M is the best element of the model (the element of M for which the Kullback Leibler divergence D(P M ) is minimised). The plug-in model has the same regret (to O(1)) as the NML model if and only if the variance of the generating distribution is the same as 13 the variance of the best element of the model.…”
Section: Poor Performance Of the Plug-in Criterionmentioning
confidence: 99%
“…It has been shown that the expression in (15) essentially reduces to SC 3 in (11) as n → ∞ under regularity conditions (Rissanen, 1986(Rissanen, , 1987Dawid, 1992;Grünwald & de Rooij, 2005). 3 An implication of this observation is that the model that permits the greatest compression of the data is also the one that minimizes the accumulated prediction error, thereby providing justification for stochastic complexity as a predictive inference method, at least asymptotically.…”
Section: Predictive Inference and The MDL Principlementioning
confidence: 99%
“…The stimuli were generated 3 The primary regularity condition required for the equivalence proof is that the maximum likelihood estimateθ(x t ) satisfies the central limit theorem such that the tail probabilities are uniformly summable in the following sense: P √ n θ (x t ) − θ ≥ n ≤ δ(n) for all θ and n δ(n) < ∞ where θ denotes a norm measure (Rissanen, 1986, Theorem 1). Recently, Grünwald and de Rooij (2005) identified another important condition for the asymptotic approximation, i.e., that the model is correctly specified. According to their investigation, under model mis-specification, one can get quite different asymptotic results.…”
Section: Using Nml In Cognitive Modelingmentioning
confidence: 99%
“…NML, however, requires knowledge of the time horizon and is impractical to calculate in many situations. A particularly simple and popular prediction strategy is the maximum likelihood (ML) strategy [1], [9], which predicts the next outcome x n by using the distribution Pθ n−1 , withθ n−1 being the ML estimator based on the n − 1 past outcomes. The ML strategy, contrary to NML, belongs to the family of plug-in strategies which in each iteration predict with one of the strategies from the model.…”
mentioning
confidence: 99%