Protein Motifs and Data-Base Searching

Thornton, Janet M.; Gardner, Stephen P.

doi:10.1016/b978-1-85166-512-9.50020-0

Cited by 12 publications

(10 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We used the Iditis program from Oxford Molecular (Thornton & Gardner, 1989), a program for querying the PDB in relational form, to extract sequences of specific secondary structure assigned by the extended DSSP method (Kabsch & Sander, 1983) implemented in Iditis.…”

Section: Constructing Sequence Data Setsmentioning

confidence: 99%

Discovering structural correlations in α‐helices

Klingler

Brutlag

1994

Protein Science

View full text Add to dashboard Cite

We have developed a new representation for structural and functional motifs in protein sequences based on correlations between pairs of amino acids and applied it to a-helical and P-sheet sequences. Existing probabilistic methods for representing and analyzing protein sequences have traditionally assumed conditional independence of evidence. In other words, amino acids are assumed to have no effect on each other. However, analyses of protein structures have repeatedly demonstrated the importance of interactions between amino acids in conferring both structure and function. Using Bayesian networks, we are able to model the relationships between amino acids at distinct positions in a protein sequence in addition to the amino acid distributions at each position. We have also developed an automated program for discovering sequence correlations using standard statistical tests and validation techniques. In this paper, we test this program on sequences from secondary structure motifs, namely a-helices and @sheets. In each case, the correlations our program discovers correspond well with known physical and chemical interactions between amino acids in structures. Furthermore, we show that, using different chemical alphabets for the amino acids, we discover structural relationships based on the same chemical principle used in constructing the alphabet. This new representation of 3-dimensional features in protein motifs, such as those arising from structural or functional constraints on the sequence, can be used to improve sequence analysis tools including pattern analysis and database search.Keywords: a-helix structure; amino acid correlations; motif modeling; sequence analysis; side-chain interactions; structure analysis Understanding the 3-dimensional structure of a protein is a necessary and critical step toward understanding the protein's function. For example, only after the structure of hemoglobin was solved was it possible to dissect the mechanisms responsible for the cooperative binding of oxygen, for the effects of pH and 2-3-diphosphoglycerate (DPG) on affinity, and for the defects causing various anemias (Stryer, 1988). Despite the increasing wealth of sequence data, the laborious and time-consuming process of empirical structure determination hampers the availability of detailed structural information. Instead, sequence analysis tools offer the best hope for quickly eliciting structural and functional information from new sequences.Traditional methods for analyzing sequences rely on the prior analyses of known sequences and on procedures for matching sequences. These techniques encompass database search (Wilbur & Lipman, 1983), sequence classification (Klein et al., 1984;Klein & DeLisi, 1986), and analysis for motifs (Bairoch & Boeckmann, 1991;Henikoff & Henikoff, 1991) ~-techniques for both analysis and matching emphasize the conservation of amino acids during evolution. Specifically, one usually assumes that if 2 sequences are homologous, then the amino acids that one observes at corresponding loca...

show abstract

Section: Constructing Sequence Data Setsmentioning

confidence: 99%

Discovering structural correlations in α‐helices

Klingler

Brutlag

1994

Protein Science

View full text Add to dashboard Cite

show abstract

“…Figure 2 illustrates these four types of protein motifs, along with some further subclassifications which are elaborated upon later in the paper. The first three motif types are discussed by Thornton and Gardner (1989); the structure-sequence motif will be presented in this paper. Machine discovery of protein motifs of various types is currently an area of intense interest in molecular biology.…”

Section: Figurementioning

confidence: 99%

Machine discovery of protein motifs

Conklin¹

1995

Mach Learn

View full text Add to dashboard Cite

Abstract. The investigation of relations between protein tertiary structure and amino acid sequence is a topic of tremendous importance in molecular biology. The automated discovery of recurrent patterns of structure and sequence is an essential part of this investigation. These patterns, known as protein motifs, are abstractions of fragments drawn from proteins of known sequence and tertiary structure. This paper has two objectives. The first is to introduce and define protein motifs, and provide a survey of previous research on protein motif discovery. The second is to present and apply a novel approach to protein motif representation and discovery, which is based on a spatial description logic and the symbolic machine learning paradigm of structured concept formation. A large database of protein fragments is processed using this approach, and several interesting and significant protein motifs are discovered.

show abstract

“…The chapter by Searls in the present volume discusses complex grammars for biosequences. A few related examples include Abarbanel et al [1984] Barton [1990], Barton and Sternberg [1990], Blundell et al [1987], Bork and Grunwald [1990], Boswell [1988], Cockwell and Giles [1989], Gribskov et al [1987Gribskov et al [ , 1988, Hertz et al [1990], Lawrence and Reilly [1990], Myers and Miller [1989], Owens et al [1988], Patthy [1987,1988], Sibbald and Argos [1990a], , Smith and Smith [1989], Staden [1989], Stormo [1990], Stormo and Hartzell [1989], Taylor [1986Taylor [ , 1988b, Thornton and Gardner [1989], Waterman and Jones [1990], and Webster et al [1989].…”

Section: Comparing Primary Sequences To Patternsmentioning

confidence: 99%

Artificial intelligence and molecular biology

1994

Choice Reviews Online

View full text Add to dashboard Cite

Protein Motifs and Data-Base Searching

Cited by 12 publications

References 26 publications

Discovering structural correlations in α‐helices

Discovering structural correlations in α‐helices

Machine discovery of protein motifs

Artificial intelligence and molecular biology

Contact Info

Product

Resources

About