We present a method based on hierarchical self-organizing maps (SOMs) for recognizing patterns in protein sequences. The method is fully automatic, does not require prealigned sequences, is insensitive to redundancy in the training set, and works surprisingly well even with small learning sets. Because it uses unsupervised neural networks, it is able to extract patterns that are not present in all of the unaligned sequences of the learning set. The identification of these patterns in sequence databases is sensitive and efficient.The procedure comprises three main training stages. In the first stage, one SOM is trained to extract common features from the set of unaligned learning sequences. A feature is a number of ungapped sequence segments (usually 4-16 residues long) that are similar to segments in most of the sequences of the learning set according to an initial similarity matrix. In the second training stage, the recognition of each individual feature is refined by selecting an optimal weighting matrix out of a variety of existing amino acid similarity matrices. In a third stage of the SOM procedure, the position of the features in the individual sequences is learned. This allows for variants with feature repeats and feature shuffling.The procedure has been successfully applied to a number of notoriously difficult cases with distinct recognition problems: helix-turn-helix motifs in DNA-binding proteins, the CUB domain of developmentally regulated proteins, and the superfamily of ribokinases. A comparison with the established database search procedure PRO-FILE (and with several others) led to the conclusion that the new automatic method performs satisfactorily.Keywords: amino acid sequences; multiple alignment; neural network; pattern recognition; self-organizing mapsIn sequencing projects, a database search for similar sequences is an inexpensive first attempt at suggesting the biological function of newly sequenced primary structures. More and more sequences are assignable to families, and there are a number of published procedures for the heuristic recognition of local sequence patterns using the information inherent in a set of related specimens or in a consensus model. All such strategies have a circularity problem, however, in that pattern recognition presupposes a valid alignment of the sequences, whereas the construction of an alignment requires previous knowledge of the pattern. Although in the case of a very clear-cut and distinct pattern this difficulty may be alleviated by a skillful iteration procedure, serious problems may arise when one or several of the following situations apply: the presence of a fuzzy pattern (difficult to distinguish from noise), very liberal alignment (too many possible insertions/deletions), or undersampling (prohibReprint requests to: