Tighter Expected Generalization Error Bounds via Convexity of Information Measures

Aminian, Gholamali; Bu, Yuheng; Wornell, Gregory W.; Rodrigues, Miguel R. D.

doi:10.1109/isit50566.2022.9834474

Cited by 10 publications

(5 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar bounds on the EGE were obtained in (Bu, Zou, and Veeravalli 2020;Chu and Raginsky 2023;Hafez-Kolahi et al 2020;Hellström and Durisi 2020) and references therein. Other information measures such as the Wasserstein distance (Aminian et al 2022;Lopez and Jog 2018;Wang et al 2019), maximal leakage (Esposito, Gastpar, and Issa 2020;Issa, Esposito, and Gastpar 2019), mutual f -information (Masiha, Gohari, and Yassaee 2023), and Jensen-Shannon divergence were used for providing upper bounds on EGE as well. In (Duchi, Glynn, and Namkoong 2021), the notion of closeness of probability measures with respect to a reference measure in terms of statistical distances was used.…”

Section: Related Workmentioning

confidence: 99%

“…The expected generalization error (GE) is a central workhorse for the analysis of generalization capabilities of machine learning algorithms, see for instance (Aminian et al , 2022Chu and Raginsky 2023;Xu and Raginsky 2017) and (Perlaza et al 2023). In a nutshell, the GE characterizes the ability of the learning algorithm to correctly find patterns in datasets that are not available during the training stage.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Generalization Analysis of Machine Learning Algorithms via the Worst-Case Data-Generating Probability Measure

Zou,

Perlaza,

Esnaola

et al. 2024

AAAI

View full text Add to dashboard Cite

In this paper, the worst-case probability measure over the data is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms. More specifically, the worst-case probability measure is a Gibbs probability measure and the unique solution to the maximization of the expected loss under a relative entropy constraint with respect to a reference probability measure. Fundamental generalization metrics, such as the sensitivity of the expected loss, the sensitivity of the empirical risk, and the generalization gap are shown to have closed-form expressions involving the worst-case data-generating probability measure. Existing results for the Gibbs algorithm, such as characterizing the generalization gap as a sum of mutual information and lautum information, up to a constant factor, are recovered. A novel parallel is established between the worst-case data-generating probability measure and the Gibbs algorithm. Specifically, the Gibbs probability measure is identified as a fundamental commonality of the model space and the data space for machine learning algorithms.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Generalization Analysis of Machine Learning Algorithms via the Worst-Case Data-Generating Probability Measure

Zou,

Perlaza,

Esnaola

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…[27] provides tighter bounds by considering the individual sample mutual information, [28], [29] propose using chaining mutual information, and [30]- [32] advocate the conditioning and processing techniques. Information-theoretic generalization error bounds using other information quantities are also studied, such as f -divergence [33], α-Rényi divergence and maximal leakage [34], [35], Jensen-Shannon divergence [36], [37] and Wasserstein distance [38]- [41]. In [42], upper bounds in terms of mutual information are obtained by employing coupling and chaining techniques in the space of probability measures.…”

Section: F Other Related Workmentioning

confidence: 99%

Information-Theoretic Characterizations of Generalization Error for the Gibbs Algorithm

Aminian,

Bu,

Toni

et al. 2024

IEEE Trans. Inform. Theory

Self Cite

View full text Add to dashboard Cite

Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and even vacuous when evaluated in practice. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contributions are exact characterizations of the expected generalization error of the well-known Gibbs algorithm (a.k.a. Gibbs posterior) using different information measures, in particular, the symmetrized KL information between the input training samples and the output hypothesis. Our result can be applied to tighten existing expected generalization errors and PAC-Bayesian bounds. Our information-theoretic approach is versatile, as it also characterizes the generalization error of the Gibbs algorithm with a data-dependent regularizer and that of the Gibbs algorithm in the asymptotic regime, where it converges to the standard empirical risk minimization algorithm. Of particular relevance, our results highlight the role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm.

show abstract

“…One of the traditional performance metrics to evaluate the generalization capabilities of the Gibbs algorithm is the generalization error. When the reference measure is a probability measure, a closed-form expression for the generalization error of the Gibbs algorithm is presented in [9], while upper bounds have been derived in [16], [21], [28]- [34], [39]- [52], and references therein. In this work, a new performance metric coined sensitivity, which quantifies the variations of the expected empirical risk due to deviations from the solution of the ERM-RER problem is introduced.…”

Section: Introductionmentioning

confidence: 99%

Empirical Risk Minimization With Relative Entropy Regularization

Perlaza,

Bisson,

Esnaola

et al. 2024

IEEE Trans. Inform. Theory

View full text Add to dashboard Cite

show abstract

Tighter Expected Generalization Error Bounds via Convexity of Information Measures

Cited by 10 publications

References 17 publications

Generalization Analysis of Machine Learning Algorithms via the Worst-Case Data-Generating Probability Measure

Generalization Analysis of Machine Learning Algorithms via the Worst-Case Data-Generating Probability Measure

Information-Theoretic Characterizations of Generalization Error for the Gibbs Algorithm

Empirical Risk Minimization With Relative Entropy Regularization

Contact Info

Product

Resources

About