Topological maps of protein sequences

Ferràn, Edgardo A.; Ferrara, Pascual

doi:10.1007/bf00204658

Cited by 44 publications

(30 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section we summarize the standard formalism of the method that we have previously proposed (see Ferran & Ferrara [1991 for a detailed description).…”

Section: Methodsmentioning

confidence: 99%

“…Although network training is time consuming, once the topological map is obtained, the classification of a new protein is very fast. We have tested the method by considering both small (-10 sequences) and large (-450 sequences) learning sets of well-defined protein families (Ferran & Ferrara, 1991, 1992a. For small learning sets, we have also shown that the trained network is able to classify correctly mutated or incomplete sequences of the learned proteins (Ferran & Ferrara, 1991).…”

mentioning

confidence: 99%

“…This approach should not be limited by database size, because the number of macromolecular families is expected to grow more slowly than the number of sequences. Recently, 2 different neural-network-based methods following this approach have been proposed (Ferran & Ferrara, 1991;Wu et al, 1992).…”

mentioning

confidence: 99%

“…This approach has been applied to detect signal peptide coding regions (Arrigo et al, 1991) and to cluster small organic molecules of analogue structure into families of similar activity (Rose et al, 1991). We have proposed a method based on Kohonen's algorithm to cluster protein sequences into families according to their degree of sequence similarity (Ferran & Ferrara, 1991, 1992a. The network was trained using, as inputs, matrix patterns of 20 X 20 components derived from the dipeptide composition of the protein sequences.…”

mentioning

confidence: 99%

“…We have tested the method by considering both small (-10 sequences) and large (-450 sequences) learning sets of well-defined protein families (Ferran & Ferrara, 1991, 1992a. For small learning sets, we have also shown that the trained network is able to classify correctly mutated or incomplete sequences of the learned proteins (Ferran & Ferrara, 1991). We have also found, using a learning set of 76 cytochrome c sequences belonging to different species, that the time evolution of the map during learning roughly resembles the phylogenetic classification of the involved species (Ferran & Ferrara, 1992b).…”

mentioning

confidence: 99%

See 4 more Smart Citations

Self‐organized neural maps of human protein sequences

Ferràn

Pflugfelder²,

Ferrara

1994

Protein Science

View full text Add to dashboard Cite

We have recently described a method based on artificial neural networks to cluster protein sequences into families. The network was trained with Kohonen's unsupervised learning algorithm using, as inputs, the matrix patterns derived from the dipeptide composition of the proteins. We present here a large-scale application of that method to classify the 1,758 human protein sequences stored in the SwissProt database (release 19.0), whose lengths are greater than 50 amino acids. In the final 2-dimensional topologically ordered map of 15 X 15 neurons, proteins belonging to known families were associated with the same neuron or with neighboring ones. Also, as an attempt to reduce the time-consuming learning procedure, we compared 2 learning protocols: one of 500 epochs (100 SUN CPU-hours [CPU-h]), and another one of 30 epochs (6.7 CPU-h). A further reduction of learningcomputing time, by a factor of about 3.3, with similar protein clustering results, was achieved using a matrix of 11 x 11 components to represent the sequences. Although network training is time consuming, the classification of a new protein in the final ordered map is very fast (14.6 CPU-seconds). We also show a comparison between the artificial neural network approach and conventional methods of biosequence analysis.

show abstract

“…In this section we summarize the standard formalism of the method that we have previously proposed (see Ferran & Ferrara [1991 for a detailed description).…”

Section: Methodsmentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations

Self‐organized neural maps of human protein sequences

Ferràn

Pflugfelder²,

Ferrara

1994

Protein Science

View full text Add to dashboard Cite

show abstract

Advances in the prediction of protein targeting signals

Schneider

Fechner

2004

Proteomics

View full text Add to dashboard Cite

Enlarged sets of reference data and special machine learning approaches have improved the accuracy of the prediction of protein subcellular localization. Recent approaches report over 95% correct predictions with low fractions of false-positives for secretory proteins. A clear trend is to develop specifically tailored organism- and organelle-specific prediction tools rather than using one general method. Focus of the review is on machine learning systems, highlighting four concepts: the artificial neural feed-forward network, the self-organizing map (SOM), the Hidden-Markov-Model (HMM), and the support vector machine (SVM).

show abstract

Self‐organizing hierarchic networks for pattern recognition in protein sequence

et al. 1996

View full text Add to dashboard Cite

We present a method based on hierarchical self-organizing maps (SOMs) for recognizing patterns in protein sequences. The method is fully automatic, does not require prealigned sequences, is insensitive to redundancy in the training set, and works surprisingly well even with small learning sets. Because it uses unsupervised neural networks, it is able to extract patterns that are not present in all of the unaligned sequences of the learning set. The identification of these patterns in sequence databases is sensitive and efficient.The procedure comprises three main training stages. In the first stage, one SOM is trained to extract common features from the set of unaligned learning sequences. A feature is a number of ungapped sequence segments (usually 4-16 residues long) that are similar to segments in most of the sequences of the learning set according to an initial similarity matrix. In the second training stage, the recognition of each individual feature is refined by selecting an optimal weighting matrix out of a variety of existing amino acid similarity matrices. In a third stage of the SOM procedure, the position of the features in the individual sequences is learned. This allows for variants with feature repeats and feature shuffling.The procedure has been successfully applied to a number of notoriously difficult cases with distinct recognition problems: helix-turn-helix motifs in DNA-binding proteins, the CUB domain of developmentally regulated proteins, and the superfamily of ribokinases. A comparison with the established database search procedure PRO-FILE (and with several others) led to the conclusion that the new automatic method performs satisfactorily.Keywords: amino acid sequences; multiple alignment; neural network; pattern recognition; self-organizing mapsIn sequencing projects, a database search for similar sequences is an inexpensive first attempt at suggesting the biological function of newly sequenced primary structures. More and more sequences are assignable to families, and there are a number of published procedures for the heuristic recognition of local sequence patterns using the information inherent in a set of related specimens or in a consensus model. All such strategies have a circularity problem, however, in that pattern recognition presupposes a valid alignment of the sequences, whereas the construction of an alignment requires previous knowledge of the pattern. Although in the case of a very clear-cut and distinct pattern this difficulty may be alleviated by a skillful iteration procedure, serious problems may arise when one or several of the following situations apply: the presence of a fuzzy pattern (difficult to distinguish from noise), very liberal alignment (too many possible insertions/deletions), or undersampling (prohibReprint requests to:

show abstract

Topological maps of protein sequences

Cited by 44 publications

References 15 publications

Self‐organized neural maps of human protein sequences

Self‐organized neural maps of human protein sequences

Advances in the prediction of protein targeting signals

Self‐organizing hierarchic networks for pattern recognition in protein sequence

Contact Info

Product

Resources

About