1992
DOI: 10.1162/neco.1992.4.1.1
|View full text |Cite
|
Sign up to set email alerts
|

Neural Networks and the Bias/Variance Dilemma

Abstract: Feedforward neural networks trained by error backpropagation are examples of nonparametric regression estimators. We present a tutorial on nonparametric inference and its relation to neural networks, and we use the statistical viewpoint to highlight strengths and weaknesses of neural models. We illustrate the main points with some recognition experiments involving artificial data as well as handwritten numerals. In way of conclusion, we suggest that current-generation feedforward neural networks are largely in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

12
1,519
0
25

Year Published

1996
1996
2018
2018

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 2,910 publications
(1,602 citation statements)
references
References 49 publications
12
1,519
0
25
Order By: Relevance
“…Our arguments in favor of structured models build on claims made by previous researchers (Geman et al, 1992;Chomsky, 1986), and psychologists and AI researchers have been developing structured models for many years (Collins & Quillian, 1969;Davis, 1990;Lenat, 1995).…”
Section: Modeling Frameworkmentioning
confidence: 83%
“…Our arguments in favor of structured models build on claims made by previous researchers (Geman et al, 1992;Chomsky, 1986), and psychologists and AI researchers have been developing structured models for many years (Collins & Quillian, 1969;Davis, 1990;Lenat, 1995).…”
Section: Modeling Frameworkmentioning
confidence: 83%
“…4.4) with high generalization capability. Most approaches address the bias/variance dilemma (Geman et al, 1992) through strong prior assumptions. For example, weight decay (Hanson and Pratt, 1989;Weigend et al, 1991;Krogh and Hertz, 1992) encourages near-zero weights, by penalizing large weights.…”
Section: Better Bp Through Advanced Gradient Descent (Compare Sec 524)mentioning
confidence: 99%
“…Because in real-life situations training data are always at a premium (Edelman, 2002), and because high-VCdim classifiers are too flexible and are therefore prone to overfitting (Baum and Haussler, 1989;Geman et al, 1992), it is necessary to break down the classification task into elements by relegating them to purposive visual subsystems. Being dedicated to a particular task (such as face recognition), such systems can afford to employ the simplest possible classifier that is up to the job.…”
Section: Representational Capacitymentioning
confidence: 99%