2020
DOI: 10.1109/tit.2020.2996134
|View full text |Cite
|
Sign up to set email alerts
|

Finite-Sample Concentration of the Multinomial in Relative Entropy

Abstract: We show that the moment generating function of the Kullback-Leibler divergence (relative entropy) between the empirical distribution of n independent samples from a distribution P over a finite alphabet of size k (e.g. a multinomial distribution) and P itself is no more than that of a gamma distribution with shape k − 1 and rate n. The resulting exponential concentration inequality becomes meaningful (less than 1) when the divergence ε is larger than (k − 1)/n, whereas the standard method of types bound requir… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 22 publications
(8 citation statements)
references
References 21 publications
0
8
0
Order By: Relevance
“…The proof of Theorem 1.5 follows a similar outline as that of the earlier work [ Agr20b ] for the non-centered moment generating function, later extended to the centered version by [ BP21 ], namely it reduces the multinomial case to the simpler case 𝑘 = 2 of the binomial and then bounding the binomial. Our point of departure is in the reduction used: the aforementioned works used a reduction that takes advantage of the dependence between the variables 𝑋 𝑖 for 𝑖 ∈ {1, … , 𝑘} and as a result has bounds in terms of 𝑘 − 1, but does not adapt as easily to the centered case (though it can be done, as in [ BP21 ]); by contrast we use a reduction that shows we can consider independent 𝑋 𝑖 by incurring a quadratic loss, resulting in a simpler proof and stronger bound in the centered case (though weaker in the non-centered case, both via the quadratic loss and by depending on 𝑘 rather than 𝑘 − 1).…”
Section: Introductionmentioning
confidence: 89%
See 1 more Smart Citation
“…The proof of Theorem 1.5 follows a similar outline as that of the earlier work [ Agr20b ] for the non-centered moment generating function, later extended to the centered version by [ BP21 ], namely it reduces the multinomial case to the simpler case 𝑘 = 2 of the binomial and then bounding the binomial. Our point of departure is in the reduction used: the aforementioned works used a reduction that takes advantage of the dependence between the variables 𝑋 𝑖 for 𝑖 ∈ {1, … , 𝑘} and as a result has bounds in terms of 𝑘 − 1, but does not adapt as easily to the centered case (though it can be done, as in [ BP21 ]); by contrast we use a reduction that shows we can consider independent 𝑋 𝑖 by incurring a quadratic loss, resulting in a simpler proof and stronger bound in the centered case (though weaker in the non-centered case, both via the quadratic loss and by depending on 𝑘 rather than 𝑘 − 1).…”
Section: Introductionmentioning
confidence: 89%
“…𝜀, and also posed several conjectures about improved bounds. Subsequently, the author gave an incomparable exponential bound[ Agr20b ] (further improved by Guo and Richardson[ GR21 ]) which becomes non-trivial for 𝜀 > by bounding the moment generating function of 𝑉 𝑛,𝑘,𝑃 .Most of the above bounds focused on the question of bounding the probability that 𝑉 𝑛,𝑘,𝑃 exceeds 0 by some 𝜀, but it is also natural to ask about concentration around 𝐄…”
mentioning
confidence: 99%
“…Whenever an actual relevance judgments distribution is available, we propose to use another pointwise loss function which takes into account the distribution of values over a number of relevance grades, interpreting them as outcomes from a multinomial distribution (Agrawal, 2020;Bishop, 2006)…”
Section: Pointwise Loss Functionsmentioning
confidence: 99%
“…We emphasize that our result applies to the loss L(p, p) = E[KL(p p)], and not to the (different) goal of minimizing E[KL(p p)]. This latter goal is much easier, in the sense that the empirical estimator not only provides non-trivial bounds for it, but is known to achieve the optimal (up to constant factors) rate, as well as to provide (similarly optimal) high-probability bounds [2,16,3]. We note that [15] provides upper and lower bounds on the variance of E[KL(p p)] for the empirical estimator.…”
Section: Introductionmentioning
confidence: 99%