Information-Theoretic Bounds on the Moments of the Generalization Error of Learning Algorithms

Aminian, Gholamali; Toni, Laura; Rodrigues, Miguel R. D.

doi:10.48550/arxiv.2102.02016

Cited by 1 publication

(1 citation statement)

References 11 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[28] provides tighter bounds by considering the individual sample mutual information, [25,29] propose using chaining mutual information, and [30,31,32] advocate the conditioning and processing techniques. Information-theoretic generalization error bounds using other information quantities are also studied, such as, f -divergence [33], α-Rényi divergence and maximal leakage [34,35], and Jensen-Shannon divergence [36,37]. Using rate-distortion theory, [38,39,40] provide information-theoretic generalization error upper bounds for model misspecification and model compression.…”

Section: Related Workmentioning

confidence: 99%

Characterizing and Understanding the Generalization Error of Transfer Learning with Gibbs Algorithm

Bu¹,

Aminian²,

Toni³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular transfer learning approaches, α-weighted-ERM and two-stage-ERM.Our key result is an exact characterization of the generalization behaviour using the conditional symmetrized KL information between the output hypothesis and the target training samples given the source samples. Our results can also be applied to provide novel distribution-free generalization error upper bounds on these two aforementioned Gibbs algorithms. Our approach is versatile, as it also characterizes the generalization errors and excess risks of these two Gibbs algorithms in the asymptotic regime, where they converge to the α-weighted-ERM and two-stage-ERM, respectively. Based on our theoretical results, we show that the benefits of transfer learning can be viewed as a bias-variance trade-off, with the bias induced by the source distribution and the variance induced by the lack of target samples. We believe this viewpoint can guide the choice of transfer learning algorithms in practice.

show abstract