2022
DOI: 10.48550/arxiv.2202.11461
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exponential Tail Local Rademacher Complexity Risk Bounds Without the Bernstein Condition

Abstract: The local Rademacher complexity framework is one of the most successful general-purpose toolboxes for establishing sharp excess risk bounds for statistical estimators based on the framework of empirical risk minimization. Applying this toolbox typically requires using the Bernstein condition, which often restricts applicability to convex and proper settings. Recent years have witnessed several examples of problems where optimal statistical performance is only achievable via non-convex and improper estimators o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 50 publications
(60 reference statements)
0
4
0
Order By: Relevance
“…The following lemma highlights the main property of the Q-aggregation estimator that distinguishes its analysis from the exponential weights. This lemma bears similarity with the analysis using offset Rademacher complexities [45,75,67,38], where the curvature of the loss function allows to obtain a certain "complexity regularizing" term. However, the term obtained in the lemma stated below is different from the one present in the offset Rademacher complexity analysis, which would correspond to a term proportional to…”
Section: A1 Basic Identities and Inequalitiesmentioning
confidence: 71%
“…The following lemma highlights the main property of the Q-aggregation estimator that distinguishes its analysis from the exponential weights. This lemma bears similarity with the analysis using offset Rademacher complexities [45,75,67,38], where the curvature of the loss function allows to obtain a certain "complexity regularizing" term. However, the term obtained in the lemma stated below is different from the one present in the offset Rademacher complexity analysis, which would correspond to a term proportional to…”
Section: A1 Basic Identities and Inequalitiesmentioning
confidence: 71%
“…Our proof is making most of the standard arguments on the Laplace transform of shifted or offset empirical processes. A similar approach was exploited by many authors Wegkamp (2003); Lecué and Rigollet (2014); Liang, Rakhlin, and Sridharan (2015); Zhivotovskiy and Hanneke (2018); Kanade, Rebeschini, and Vaškevičius (2022) in the statistical setup, though their analysis is specific to strongly convex losses or binary losses under additional probabilistic assumptions. In (Vijaykumar, 2021), the author generalized the approach of Liang et al (2015) to study the properties of ERM under exp-concave losses, though their analysis only focuses on getting the O(1/n) rate of convergence and fails to capture the local structure of the reference set.…”
Section: Proof Of Theorem 34mentioning
confidence: 99%
“…Negative terms typically appear when proving fast rates in statistical learning with squared loss. In particular, the empirical star algorithm of Audibert (2007)-as well as other aggregation algorithms (Lecué and Mendelson, 2009;Lecué and Rigollet, 2014;Wintenberger, 2017)-exploit the curvature of the loss through the negative term which compensates the variance term (see (Kanade et al, 2022) for a detailed discussion in the context of statistical learning). Similarly, in the context of online learning the negative quadratic term appears in (Rakhlin and Sridharan, 2014), where the so-called sequential offset Rademacher complexity is studied.…”
Section: Related Workmentioning
confidence: 99%