Artificial neural network models (also known as Parallel Distributed Processing or Connectionist models) have been highly influential in cognitive science since the mid-1980s. The original inspiration for these systems comes from information processing in the brain, which emerges from a large number of (nearly) identical, simple processing units (neurons) that are interconnected into a network. Each unit receives activation from other units or by stimulation from the external world, and generates an output activation that is a function of the total input activation received. The unit then feeds the output activation onward to the units to which it is connected. Information processing is thus implemented in terms of activation flowing through this network.Each connection between two units has a weight that determines how strongly the first unit affects the second. These weights can be adapted, which constitutes learning, or "training" as it is commonly known in the neural network literature. Algorithms for network training can be roughly divided into supervised and unsupervised methods. Supervised training is applied when a specific and known input-to-output mapping is required (e.g., learning to transform orthographic to phonological representations). To accomplish this, the network is provided with a representative set of "training examples" of inputs and the corresponding target outputs. It then processes each example and the difference between the networks' actual output and the target output leads to an update of the connection weights such that, next time, the output error will be smaller. By far the best known and most used method for supervised training is the Backpropagation algorithm (Rumelhart, Hinton, & Williams, 1986) that makes the network's output activations for the training examples gradually converge toward the target outputs. Unsupervised training, in contrast, makes the network adapt to (aspects of) the statistical structure of input examples without mapping to target outputs (e.g., discovery of regularities in the phonological structure of language). These networks are well-suited to uncovering statistical structure present in the environment without requiring the modeller being aware what the structure is. One well-known example of an unsupervised training method is the learning rule proposed by Hebb (1949): Strengthen connections between units that are simultaneously active and weaken the connections between two units if only one is active.In spite of the superficial similarities between artificial and biological neural networks (i.e., interconnectivity and stimulation passing between neurons to determine their activation, and learning by adaptation of connection strengths), these cognitive models are not usually claimed to simulate processing at the level of biological neurons. Rather, neural network models form a description at Marr's (1982) algorithmic level, that is, they specify cognitive representations and operations while ignoring the biological implementation.Neural networks underwe...