theory and mathematical aspects. PACS. 05.50.+q -Lattice theory and statistics (Ising, Potts, etc.). PACS. 64.60.Cn -Order-disorder transformations; statistical mechanics of model systems.Abstract. -An unsupervised learning procedure based on maximizing the mutual information between the outputs of two networks receiving different but statistically dependent inputs is analyzed (Becker S. and Hinton G., Nature, 355 (1992) 161). By exploiting a formal analogy to supervised learning in parity machines, the theory of zero-temperature Gibbs learning for the unsupervised procedure is presented for the case that the networks are perceptrons and for the case of fully connected committees.It has long been realized that information theory can provide a useful conceptual framework for unsupervised learning in neural networks. The basic idea is to treat the network as a channel of limited capacity and to adapt the parameters of the network to optimize the information transfer. Different optimality criteria exist, and a discussion of such criteria and some equivalences between them from the perspective of statistical physics is given in [1,2].In most cases, however, the information theoretic approach has led to useful learning algorithms only for single-layer networks. An exception is the proposal by Becker and Hinton (see [3], a review is given in [4]) which builds on ideas from computational linguistics. They assume two different but statistically dependent modes of input, ξ 1 and ξ 2 , and each of these modes is processed by a different network. Due to the statistical dependency, some features of one input mode will be predictable given the other input mode, and the goal of training is to discover such mutually predictable features by maximizing the mutual information of the outputs of the two networks.It is worthwhile mentioning that in the context of sensory processing the scenario considered by Becker and Hinton is by no means artificial. For instance, simultaneous auditory (ξ 1 ) and visual (ξ 2 ) sensations are statistically dependent since they may be caused by the same object, and such dependences obviously provide useful information about the nature of the object. Statistical dependences also arise within one sensory system at different times: In speech the current phoneme (ξ 1 ) is to a certain extent predictable from the preceding ones (ξ 2 ) and the same phenomenon re-occurs at the level of words.In an application to vision, a simulation in [3] shows that two multilayer networks can learn higher-order features by maximizing the mutual information of their outputs. They learn to estimate the distance of an object from the stereo disparity which may arise when the object c EDP Sciences