Information-Theoretic Analysis of Minimax Excess Risk

Hafez-Kolahi, Hassan; Moniri, Behrad; Kasaei, Shohreh

doi:10.1109/tit.2023.3249636

Cited by 5 publications

(7 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This result has since been extended in various forms, mostly concentrating on providing information-theoretic bounds for the generalization capabilities of learning algorithms, instead of looking at the excess risk; see, e.g., Raginsky et al [5], Lugosi and Neu [6], Jose and Simeone [7], and the references therein, just to mention a few of these works. The most relevant recent work relating to our bounds in Section 3 seems to be Xu and Raginsky [4], where, among other things, information-theoretic bounds were developed on the excess risk in a Bayesian learning framework; see also Hafez-Kolahi et al [8]. The bounds in [4] are not on the excess risk L * (Y|T(X)) − L * (Y|X); they involve training data, but their forms are similar to ours.…”

Section: Relationship With Prior Workmentioning

confidence: 87%

Lossless Transformations and Excess Risk Bounds in Statistical Inference

Györfi,

Linder,

Walk

2023

Entropy

View full text Add to dashboard Cite

We study the excess minimum risk in statistical inference, defined as the difference between the minimum expected loss when estimating a random variable from an observed feature vector and the minimum expected loss when estimating the same random variable from a transformation (statistic) of the feature vector. After characterizing lossless transformations, i.e., transformations for which the excess risk is zero for all loss functions, we construct a partitioning test statistic for the hypothesis that a given transformation is lossless, and we show that for i.i.d. data the test is strongly consistent. More generally, we develop information-theoretic upper bounds on the excess risk that uniformly hold over fairly general classes of loss functions. Based on these bounds, we introduce the notion of a δ-lossless transformation and give sufficient conditions for a given transformation to be universally δ-lossless. Applications to classification, nonparametric regression, portfolio strategies, information bottlenecks, and deep learning are also surveyed.

show abstract

Section: Relationship With Prior Workmentioning

confidence: 87%

Lossless Transformations and Excess Risk Bounds in Statistical Inference

Györfi,

Linder,

Walk

2023

Entropy

View full text Add to dashboard Cite

show abstract

“…Notice that the bound is expressed as MI terms each involving U i and ∆L i,k , both being discrete random variables. This has not arose in the previous chained weight-based MI bounds where they either contain the continuous random variable S (Asadi et al, 2018;Zhou et al, 2022b;Clerico et al, 2022) or are conditioned on the continuous random variable Z (Hafez-Kolahi et al, 2020). Additionally, by the master definition of MI (Cover & Thomas, 2006, Eq.…”

Section: By the Independence Of U I And Z I(∆lmentioning

confidence: 97%

“…Furthermore, it is possible to establish further tightened loss-difference MI bounds for more general loss functions than those required in Theorem 3.2. Specifically, the loss function can be unbounded and continuous, as presented in next theorem, where we apply the chaining technique (Asadi et al, 2018;Hafez-Kolahi et al, 2020;Zhou et al, 2022b;Clerico et al, 2022) and the obtained bound consists of MI terms between U i and the successively quantized versions of ∆L i . To that end, let Err i (∆ i ) (−1) Ui ∆ i and let Γ ⊆ R be the range of ∆ .…”

Section: By the Independence Of U I And Z I(∆lmentioning

confidence: 99%

“…The original information-theoretic bound of Xu & Raginsky (2017) has been extended or improved in many different ways, such as the chaining method (Asadi et al, 2018;Hafez-Kolahi et al, 2020;Zhou et al, 2022b;Clerico et al, 2022), the subset or individual technique (Negrea et al, 2019;Bu et al, 2019;Haghifam et al, 2020;Rodríguez-Gálvez et al, 2021;Zhou et al, 2022a) and so on. Remarkably, Steinke & Zakynthinou (2020) has developed generalization bounds based on a conditional mutual information (CMI) measure obtained from a "supersample" setting.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Tighter Information-Theoretic Generalization Bounds from Supersamples

Wang¹,

Mao²

2023

Preprint

View full text Add to dashboard Cite

We present a variety of novel information-theoretic generalization bounds for learning algorithms, from the supersample setting of Steinke & Zakynthinou (2020)-the setting of the "conditional mutual information" framework. Our development exploits projecting the loss pair (obtained from a training instance and a testing instance) down to a single number and correlating loss values with a Rademacher sequence (and its shifted variants). The presented bounds include square-root bounds, fast-rate bounds, including those based on variance and sharpness, and bounds for interpolating algorithms etc. We show theoretically or empirically that these bounds are tighter than all information-theoretic bounds known to date on the same supersample setting.

show abstract

“…The line of work exploiting information measures to bound the expected generalization started in (Russo and Zou, 2016;Xu and Raginsky, 2017) and was then refined with a variety of approaches considering Conditional Mutual Information (Steinke and Zakynthinou, 2020;Haghifam et al, 2020), the Mutual Information between individual samples and the hypothesis (Bu et al, 2019) or improved versions of the original bounds (Issa et al, 2019;Hafez-Kolahi et al, 2020). Other approaches employed the Kullback-Leibler Divergence with a PAC-Bayesian approach (McAllester, 2013;Zhou et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

Asymptotically Optimal Generalization Error Bounds for Noisy, Iterative Algorithms

Issa¹,

Esposito²,

Gastpar³

2023

Preprint

View full text Add to dashboard Cite

We adopt an information-theoretic framework to analyze the generalization behavior of the class of iterative, noisy learning algorithms. This class is particularly suitable for study under informationtheoretic metrics as the algorithms are inherently randomized, and it includes commonly used algorithms such as Stochastic Gradient Langevin Dynamics (SGLD). Herein, we use the maximal leakage (equivalently, the Sibson mutual information of order infinity) metric, as it is simple to analyze, and it implies both bounds on the probability of having a large generalization error and on its expected value. We show that, if the update function (e.g., gradient) is bounded in L 2 -norm, then adding isotropic Gaussian noise leads to optimal generalization bounds: indeed, the input and output of the learning algorithm in this case are asymptotically statistically independent. Furthermore, we demonstrate how the assumptions on the update function affect the optimal (in the sense of minimizing the induced maximal leakage) choice of the noise. Finally, we compute explicit tight upper bounds on the induced maximal leakage for several scenarios of interest.

show abstract

Information-Theoretic Analysis of Minimax Excess Risk

Cited by 5 publications

References 22 publications

Lossless Transformations and Excess Risk Bounds in Statistical Inference

Lossless Transformations and Excess Risk Bounds in Statistical Inference

Tighter Information-Theoretic Generalization Bounds from Supersamples

Asymptotically Optimal Generalization Error Bounds for Noisy, Iterative Algorithms

Contact Info

Product

Resources

About