Jensen-Shannon Information Based Characterization of the Generalization Error of Learning Algorithms

Aminian, Gholamali; Toni, Laura; Rodrigues, Miguel R. D.

doi:10.1109/itw46852.2021.9457642

Cited by 13 publications

(10 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Information-theoretic generalization error bounds using other information quantities are also studied, such as α-Rényi divergence and maximal leakage (Esposito et al, 2021), Jensen-Shannon divergence (Aminian et al, 2021b), power divergence (Aminian et al, 2021c), and Wasserstein distance (Lopez and Jog, 2018;Wang et al, 2019). An exact characterization of the generalization error for the Gibbs algorithm is provided in (Aminian et al, 2021a). Using rate-distortion theory, Masiha et al (2021) and Bu et al (2020a) provide informationtheoretic generalization error upper bounds for model misspecification and model compression.…”

Section: Related Workmentioning

confidence: 99%

An Information-theoretical Approach to Semi-supervised Learning under Covariate-shift

Aminian¹,

Abroshan²,

Khalili³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

A common assumption in semi-supervised learning is that the labeled, unlabeled, and test data are drawn from the same distribution. However, this assumption is not satisfied in many applications. In many scenarios, the data is collected sequentially (e.g., healthcare) and the distribution of the data may change over time often exhibiting socalled covariate shifts. In this paper, we propose an approach for semi-supervised learning algorithms that is capable of addressing this issue. Our framework also recovers some popular methods, including entropy minimization and pseudo-labeling. We provide new information-theoretical based generalization error upper bounds inspired by our novel framework. Our bounds are applicable to both general semi-supervised learning and the covariate-shift scenario. Finally, we show numerically that our method outperforms previous approaches proposed for semisupervised learning under the covariate shift.

show abstract

Section: Related Workmentioning

confidence: 99%

An Information-theoretical Approach to Semi-supervised Learning under Covariate-shift

Aminian¹,

Abroshan²,

Khalili³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…We can also apply the average joint distribution approach to the Jensen-Shannon divergence based upper bound in [11].…”

Section: Jensen-shannon Divergence Based Upper Boundmentioning

confidence: 99%

“…Bu et al [8] have derived tighter generalization error bounds based on individual sample mutual information. The generalization error bounds based on other information measures such as α-Réyni divergence [9], maximal leakage [10], Jensen-Shannon divergence [11], Wasserstein distances [12,13] and individual sample Wasserstein distance [14] are also considered. Chaining mutual information technique is proposed in [15] and [16] to further improve the mutual information-based bound.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Tighter Expected Generalization Error Bounds via Convexity of Information Measures

Aminian¹,

Bu²,

Wornell³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Generalization error bounds are essential to understanding machine learning algorithms. This paper presents novel expected generalization error upper bounds based on the average joint distribution between the output hypothesis and each input training sample. Multiple generalization error upper bounds based on different information measures are provided, including Wasserstein distance, total variation distance, KL divergence, and Jensen-Shannon divergence. Due to the convexity of the information measures, the proposed bounds in terms of Wasserstein distance and total variation distance are shown to be tighter than their counterparts based on individual samples in the literature. An example is provided to demonstrate the tightness of the proposed generalization error bounds.

show abstract

“…Bounds using chaining mutual information have been proposed in [7]. Other authors have also constructed information-theoretic based average generalization error bounds using quantities such as α-Réyni divergence, f -divergence, Jensen-Shannon divergences, Wasserstein distances, or maximal leakage (see [8], [9], [10], [11], or [12]).…”

Section: Introductionmentioning

confidence: 99%

Information-Theoretic Bounds on the Moments of the Generalization Error of Learning Algorithms

Aminian¹,

Toni²,

Rodrigues³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Generalization error bounds are critical to understanding the performance of machine learning models. In this work, building upon a new bound of the expected value of an arbitrary function of the population and empirical risk of a learning algorithm, we offer a more refined analysis of the generalization behaviour of a machine learning models based on a characterization of (bounds) to their generalization error moments. We discuss how the proposed bounds -which also encompass new bounds to the expected generalization errorrelate to existing bounds in the literature. We also discuss how the proposed generalization error moment bounds can be used to construct new generalization error high-probability bounds.

show abstract

Jensen-Shannon Information Based Characterization of the Generalization Error of Learning Algorithms

Cited by 13 publications

References 13 publications

An Information-theoretical Approach to Semi-supervised Learning under Covariate-shift

An Information-theoretical Approach to Semi-supervised Learning under Covariate-shift

Tighter Expected Generalization Error Bounds via Convexity of Information Measures

Information-Theoretic Bounds on the Moments of the Generalization Error of Learning Algorithms

Contact Info

Product

Resources

About