1999
DOI: 10.1007/3-540-46769-6_5
|View full text |Cite
|
Sign up to set email alerts
|

Generalization Error of Linear Neural Networks in Unidentifiable Cases

Abstract: Abstract. The statistical asymptotic theory is often used in theoretical results in computational and statistical learning theory. It describes the limiting distribution of the maximum likelihood estimator (MLE) as an normal distribution. However, in layered models such as neural networks, the regularity condition of the asymptotic theory is not necessarily satisfied. The true parameter is not identifiable, if the target function can be realized by a network of smaller size than the size of the model. There ha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
9
0

Year Published

2000
2000
2016
2016

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 5 publications
1
9
0
Order By: Relevance
“…However, the asymptotic form of the generalization error has not been clarified because layered neural networks are non-identifiable statistical models [11] [22] [9] [7] [26]. In fact, if a neural network is larger than necessary to attain the true distribution, then the set of true parameters is not one point but an analytic set with singularities, hence neither the distribution of the maximum likelihood estimator nor the Bayesian a posteriori probability density function converges to the normal distribution, even if the number of training samples tends to infinity [8] [24]. Therefore, we can not apply learning theory of regular statistical models to analysis and design of neural networks.…”
Section: Introductionmentioning
confidence: 99%
“…However, the asymptotic form of the generalization error has not been clarified because layered neural networks are non-identifiable statistical models [11] [22] [9] [7] [26]. In fact, if a neural network is larger than necessary to attain the true distribution, then the set of true parameters is not one point but an analytic set with singularities, hence neither the distribution of the maximum likelihood estimator nor the Bayesian a posteriori probability density function converges to the normal distribution, even if the number of training samples tends to infinity [8] [24]. Therefore, we can not apply learning theory of regular statistical models to analysis and design of neural networks.…”
Section: Introductionmentioning
confidence: 99%
“…Compared to [18], our analysis is considerably simpler and, most importantly, covers greedy learners. An actual case analysis for Naive Bayesian classifiers that is guided by a similar idea has been presented by Langley and Sage [10], an actual case analysis for linear neural networks is given in [6].…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…Figures 10 to 13 show the generalization and the training coefficients on the identical conditions to Figures 3 to 6, respectively. The lines in the positive region correspond to the generalization coefficient of the VB approach, clarified in this letter, to that of the ML estimation, clarified in (Fukumizu, 1999), to that of the Bayes estimation, clarified in Aoyagi and Watanabe (2004) 4 and to that of the regular models, respectively; the lines in the negative region correspond to the training coefficient of the VB approach, to that of the ML estimation, and to that of the regular models, respectively. 5 Unfortunately the Bayes training error has not been clarified yet.…”
Section: Theorem 6 the Training Error Of An Lnn In The Vb Approach Cmentioning
confidence: 98%
“…We say that U is the general diagonalized matrix of an N × M matrix T if T has the following singular value decomposition: Although the second term of equation 7.7 is not a simple function, it can relatively easily be numerically calculated by creating samples subject to the Wishart distribution. Furthermore, the simpler function approximating the term can be derived in the large-scale limit when M, N, H, and H * go to infinity in the same order, in a similar fashion to the analysis of the ML estimation (Fukumizu, 1999). We define the following scalars: …”
Section: Generalization Errormentioning
confidence: 99%
See 1 more Smart Citation