1997
DOI: 10.1007/bf00114010
|View full text |Cite
|
Sign up to set email alerts
|

Rigorous learning curve bounds from statistical mechanics

Abstract: Abstract. In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the well-established VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

3
69
0

Year Published

2000
2000
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 38 publications
(72 citation statements)
references
References 30 publications
(35 reference statements)
3
69
0
Order By: Relevance
“…Bias does not vary with training data size n, but the error due to variance should decrease as O( 1 / √ n) if the training observations are independent (Domingos, 2000a,b). The power-law models used in this paper have been investigated many times in prior literature (Haussler et al, 1996;Mukherjee et al, 2003;Figueroa et al, 2012;Beleites et al, 2013;Hajian-Tilaki, 2014;Cho et al, 2015). Sun et al (2017), Barone et al (2017) and the concurrent unpublished work by Hestness et al (2017) point out that these power-law models describe modern ML and NLP systems quite well, including complex deep-learning systems, so we expect our results to generalise to these systems.…”
Section: Related Workmentioning
confidence: 99%
“…Bias does not vary with training data size n, but the error due to variance should decrease as O( 1 / √ n) if the training observations are independent (Domingos, 2000a,b). The power-law models used in this paper have been investigated many times in prior literature (Haussler et al, 1996;Mukherjee et al, 2003;Figueroa et al, 2012;Beleites et al, 2013;Hajian-Tilaki, 2014;Cho et al, 2015). Sun et al (2017), Barone et al (2017) and the concurrent unpublished work by Hestness et al (2017) point out that these power-law models describe modern ML and NLP systems quite well, including complex deep-learning systems, so we expect our results to generalise to these systems.…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, knowing that we would like a particular quality guarantee, we can ask how large a sample we need to draw to ensure that guarantee. The former question has been addressed for predictive learning in work on self-bounding learning algorithms [4] and shell decomposition bounds [7,11].…”
Section: Prior Workmentioning
confidence: 99%
“…This has been highlighted perhaps most prominently in recent work on neural network models, in which the model complexity and data size increase together. For this reason, the double asymptotic regime where n, N → ∞, with N/n → c, a constant, is a particularly interesting (and likely more realistic) limit, despite being technically more challenging [48,51,21,15,37,32,5]. In particular, working in this regime allows for a finer quantitative assessment of machine learning systems, as a function of their relative complexity N/n, as well as for a precise description of the under-to over-parameterized "phase transition" (that does not appear in the N → ∞ alone analysis).…”
Section: Introductionmentioning
confidence: 99%