Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan)
DOI: 10.1109/ijcnn.1993.714176
|View full text |Cite
|
Sign up to set email alerts
|

On the problem of applying AIC to determine the structure of a layered feedforward neural network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
14
0

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(14 citation statements)
references
References 3 publications
0
14
0
Order By: Relevance
“…Then, the matrices F and K are of the order O (1) as N goes to infinity. We write ε = 1 √ N hereafter for notational simplicity, and obtain the expansion of…”
Section: A Proof Of Theoremmentioning
confidence: 99%
See 1 more Smart Citation
“…Then, the matrices F and K are of the order O (1) as N goes to infinity. We write ε = 1 √ N hereafter for notational simplicity, and obtain the expansion of…”
Section: A Proof Of Theoremmentioning
confidence: 99%
“…It has been clarified recently that the usual statistical asymptotic theory on the MLE does not necessarily hold in neural networks ( [1], [2]). This always happens if we consider the model selection problem in neural networks.…”
Section: Introductionmentioning
confidence: 99%
“…According to this property, called asymptotic normality, the well-known criterion AIC (Akaike information criterion; Akaike, 1974) is derived. In the case of MLP, however, we cannot show the asymptotic normality of estimators because of the unidentifiability of optimal parameters, so that the effectiveness of AIC is not ensured (Hagiwara, Toda, & Usui, 1993;Anders & Korn, 1999). Although some other criteria, such as NIC (network information criterion; Murata, Yoshizawa, & Amari, 1994) and GPE (generalized prediction error; Moody, 1992), have been proposed for MLP, they are effective only when the asymptotic normality holds.…”
Section: Introductionmentioning
confidence: 96%
“…Hierarchical learning models, such as the layered neural network, the Boltzmann machine, the reduced rank regression and the normal mixture model, are known to be effective learning models for analyzing such data. These are, however, singular learning models, which cannot be analyzed using the classic theory of regular statistical models, because singular learning models have a singular Fisher metric that is not always approximated by any quadratic form [1][2][3][4]. Therefore, it is difficult to analyze their generalization errors, which indicate how precisely the predictive function approximates the true density function.…”
Section: Introductionmentioning
confidence: 99%