Evolving gradient-learning artificial neural networks (ANNs) using an evolutionary algorithm (EA) is a popular approach to address the local optima and design problems of ANN. The typical approach is to combine the strength of backpropagation (BP) in weight learning and EA's capability of searching the architecture space. However, the BP's "gradient descent" approach requires a highly computer-intensive operation that relatively restricts the search coverage of EA by compelling it to use a small population size. To address this problem, we utilized mutation-based genetic neural network (MGNN) to replace BP by using the mutation strategy of local adaptation of evolutionary programming (EP) to effect weight learning. The MGNN's mutation enables the network to dynamically evolve its structure and adapt its weights at the same time. Moreover, MGNN's EP-based encoding scheme allows for a flexible and less restricted formulation of the fitness function and makes fitness computation fast and efficient. This makes it feasible to use larger population sizes and allows MGNN to have a relatively wide search coverage of the architecture space. MGNN implements a stopping criterion where overfitness occurrences are monitored through "sliding-windows" to avoid premature learning and overlearning. Statistical analysis of its performance to some well-known classification problems demonstrate its good generalization capability. It also reveals that locally adapting or scheduling the strategy parameters embedded in each individual network may provide a proper balance between the local and global searching capabilities of MGNN.
One of the most important features of 3-layered neural networks is the adaptability of the basis functions. In this paper, in order to focus on the adaptability in a context of the regression or curvefitting, we restricted our attention to function representation in which the basis functions are modified according to the associated discrete parameters. For such function representation, we derived the expectations of the least square error and prediction square error with respect to the distribution of a set of samples using the extreme value theory, provided that the given set of samples is an independent Gaussian noise sequence and the basis functions satisfy an appropriate orthonormality condition.
In order to analyze the stochastic property of multilayered perceptrons or other learning machines, we deal with simpler models and derive the asymptotic distribution of the least-squares estimators of their parameters. In the case where a model is unidentified, we show different results from traditional linear models: the well-known property of asymptotic normality never holds for the estimates of redundant parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.