Protein motifs and data-base searching

Thornton, Janet M.; Gardner, Stephen P.

doi:10.1016/0968-0004(89)90069-8

Cited by 48 publications

(16 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several such data bases are being developed. Our algorithm can use protein or DNA/RNA structures, atomic labels, conformation coordinates, secondary structures, and tertiary interactions (29) in its structural comparisons (30). The more information included in the data base; the faster is the comparison.…”

mentioning

confidence: 99%

“…It is general and can be used on both molecular model and crystal structure data. In addition to atomic coordinates, such a data base should preferentially also contain consistently defined sets of properties, such as secondary structures and hydrogen bonding (29). Several such data bases are being developed.…”

mentioning

confidence: 99%

“…Furthermore, since the algorithm is sequence-independent, it is insensitive to gaps, insertions, or deletions, which constitute a major difficulty in structural comparisons based on sequence alignments. In principle, it can be implemented for both structure-related sequence motifs [sequence patterns that are associated with a specific structure (29)] and structural motifs (whose actual sequences may vary). It is general and can be used on both molecular model and crystal structure data.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques.

Nussinov

Wolfson

1991

Proc. Natl. Acad. Sci. U.S.A.

261

193

View full text Add to dashboard Cite

Macromolecules carrying biological information often consist of independent modules containing recurring structural motifs. Detection of a specific structural motif within a protein (or DNA) aids in elucidating the role played by the protein (DNA element) and the mechanism of its operation. The number of crystallographically known structures at high resolution is increasing very rapidly. Yet, comparison of threedimensional structures is a laborious time-consuming procedure that typically requires a manual phase. To date, there is no fast automated procedure for structural comparisons. We present an efficient 0(n3) worst case time complexity algorithm for achieving such a goal (where n is the number of atoms in the examined structure). The method is truly three-dimensional, sequence-order-independent, and thus insensitive to gaps, insertions, or deletions. This algorithm is based on the geometric hashing paradigm, which was originally developed for object recognition problems in computer vision. It introduces an indexing approach based on transformation invariant representations and is especially geared toward efficient recognition of partial structures in rigid objects belonging to large data bases. This algorithm is suitable for quick scanning of structural data bases and will detect a recurring structural motif that is a priori unknown. The algorithm uses protein (or DNA) structures, atomic labels, and their three-dimensional coordinates. Additional information pertaining to the structure speeds the comparisons. The algorithm is straightforwardly parallelizable, and several versions of it for computer vision applications have been implemented on the massively parallel connection machine. A prototype version of the algorithm has been implemented and applied to the detection ofsubstructures in proteins.One of the basic emerging principles in molecular biology is the modular nature of DNA sequence elements and of the corresponding sequence-specific protein factors recognizing them. The domains appear to be independent units (1). Structural and functional studies of these domains have demonstrated the existence of several structural motifs. The motifs include the helix-turn-helix (HTH) (2), zinc fingers (3), homeodomain (4), leucine zipper (5), helix-loop-helix (6), Ser-Pro-Lys-Lys histone (7), proline-rich (8) and glutamine-rich (9) motifs, the antiparallel 13-sheet (10) apparently inserted in the minor groove, and more recently a pair of 83-strands in the major groove of the DNA (11). All of these motifs typically include less than 100 amino acid residues. Finding a given structural motif in a protein may clearly aid in understanding its role (12). The latter is inferred by analogy with other proteins containing the motif. Structural comparisons are thus central to molecular biology. The problem we are faced with is to devise efficient techniques for routine scanning of structural data bases and searching for recurrences of inexact structural motifs. The degree of allowed errors is to be determined by th...

show abstract

mentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques.

Nussinov

Wolfson

1991

Proc. Natl. Acad. Sci. U.S.A.

261

193

View full text Add to dashboard Cite

show abstract

“…Often these systematic studies have involved a transformation of the PDB into other database or knowledge base structures (see e.g. Islam & Sternberg, 1989;Thornton & Gardner, 1989).…”

Section: Post-processing Of Search Resultsmentioning

confidence: 99%

“…Structural knowledge is a component part of the NDB (Berman et al, 1992), speci®c knowledge-based systems have already been derived from the PDB (e.g. Islam & Sternberg, 1989;Thornton & Gardner, 1989) and the value of knowledge-based approaches to protein structure determination and molecular modelling have been discussed by Allen et al (1990). In 1995, the CCDC began a programme to derive libraries of structural knowledge from the raw data content of the CSD.…”

Section: The Future: Structural Knowledge Basesmentioning

confidence: 99%

The Development, Status and Scientific Impact of Crystallographic Databases

Allen¹

1998

Acta Cryst Sect A

View full text Add to dashboard Cite

Nearly 300 000 crystal structures have been reported in the scienti®c literature and all of them are accessible through the crystallographic structural databases. The historical development, information content and current status of these databases are described, with special reference to methods for data acquisition and structure validation. The relationships that exist between authors, journals and databases are discussed in the light of statistics that predict more than 800 000 structural database entries by the year 2010, more than doubling the output of the last 30 years in less than half the time. Use of the structural databases for data mining and knowledge acquisition is summarized. So far, the vast majority of this research activity has centred around the databases that record small-molecule and macromolecular structures. The creation of knowledge-based libraries of structural information from the existing molecular databases suggests a new era of two-level information provision: the raw-data level and the derived-knowledge level. The crystallographic knowledge bases are encouraging the development of software systems that access the stored knowledge to solve complex problems in structural science.

show abstract