On the origin and highly likely completeness of single-domain protein structures

Zhang, Yang; Hubner, Isaac A.; Arakaki, Adrián K.; Shakhnovich, Eugene I.; Skolnick, Jeffrey

doi:10.1073/pnas.0509379103

Cited by 182 publications

(202 citation statements)

References 37 publications

Supporting

Mentioning

178

Contrasting

Order By: Relevance

“…A more recent study using computational sampling of homopolypeptide conformations suggested further that the current repertoire of globular protein folds is nearly complete in its coverage of all physically possible compact folds [286]. However, studies by three other groups have found instead that the current fold repertoire represents only a small fraction of all possible folds [287 -289].…”

Section: Applications Of Biophysics-based Models To Understand Proteimentioning

confidence: 99%

Biophysics of protein evolution and evolutionary protein biophysics

Sikosek

Chan

2014

J. R. Soc. Interface.

220

207

View full text Add to dashboard Cite

The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence-structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by 'hidden' conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.

show abstract

Section: Applications Of Biophysics-based Models To Understand Proteimentioning

confidence: 99%

Biophysics of protein evolution and evolutionary protein biophysics

Sikosek

Chan

2014

J. R. Soc. Interface.

220

207

View full text Add to dashboard Cite

show abstract

“…Consistent with this notion, it was estimated that there are about 1,000 structural folds in protein domains (10). Recent studies suggest that the library of single-domain protein structures is likely complete, continuous, and above the percolation threshold, largely due to the packing of compact, hydrogen-bonded secondary structural elements (11)(12)(13). Because many structural properties of real proteins are reproduced by a library of compact, hydrogen-bonded homopolypeptide structures, evolution is not necessary to explain these features.…”

mentioning

confidence: 54%

“…Second, it is known that physical constraints limit the total number of distinct structural folds for protein domains (9,12). Obviously, the same constraints may restrict the valid ways of packing pairs of proteins; this is the second reason for detecting structural similarity between unrelated interfaces.…”

Section: Resultsmentioning

confidence: 99%

Structural space of protein–protein interfaces is degenerate, close to complete, and highly connected

Gao

Skolnick

2010

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

140

133

View full text Add to dashboard Cite

At the heart of protein-protein interactions are protein-protein interfaces where the direct physical interactions occur. By developing and applying an efficient structural alignment method, we study the structural similarity of representative protein-protein interfaces involving interactions between dimers. Even without structural similarity between individual monomers that form dimeric complexes, ∼90% of native interfaces have a close structural neighbor with similar backbone C α geometry and interfacial contact pattern. About 80% of the interfaces form a dense network, where any two interfaces are structurally related using a transitive set of at most seven intermediate interfaces. The degeneracy of interface space is largely due to the packing of compact, hydrogen-bonded secondary structure elements. This packing generates relatively flat interacting surfaces whose geometries are highly degenerate. Comparative study of artificial and native interfaces argues that the library of protein interfaces is close to complete and comprised of roughly 1,000 distinct interface types. In contrast, the number of possible quaternary structures of dimers is estimated to be about 10 4 times larger; thus, an experimentally determined database of all representative quaternary structures is not likely in the near future. Nevertheless, one could in principle exploit the completeness of protein interfaces to predict most dimeric quaternary structures. Finally, our results provide a structural explanation for the prevalence of promiscuous protein interactions. By side-chain packing adjustments, we illustrate how multiprotein specificity can be attained at a promiscuous interface.

show abstract

“…Arguments were made that the PDB is indeed complete (22,23) with the current thousands of distinct folds. This argument further supports the creation of a comprehensive model of protein structure space and their sequence capacities and the progressive refinement of this model.…”

mentioning

confidence: 99%

The network of sequence flow between protein structures

Meyerguz

Kleinberg

Elber

2007

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

Sequence-structure relationships in proteins are highly asymmetric because many sequences fold into relatively few structures. What is the number of sequences that fold into a particular protein structure? Is it possible to switch between stable protein folds by point mutations? To address these questions, we compute a directed graph of sequences and structures of proteins, which is based on 2,060 experimentally determined protein shapes from the Protein Data Bank. The directed graph is highly connected at native energies with ''sinks'' that attract many sequences from other folds. The sinks are rich in ␤-sheets. The number of sequences that transition between folds is significantly smaller than the number of sequences retained by their fold. The sequence flow into a particular protein shape from other proteins correlates with the number of sequences that matches this shape in empirically determined genomes. Properties of strongly connected components of the graph are correlated with protein length and secondary structure.protein designability ͉ sequence capacity ͉ structure stability ͉ transitional sequences A s data on protein sequences and their variations become more accessible (following the abundance of large-scale sequencing and gene expression projects), it is clear that protein structures serve as evolutionary templates. Similar protein backbones are used again and again to create proteins with adjusted functions in response to environmental variations or at random. This asymmetric relationship is of considerable interest in the study of protein evolution and design and has received considerable attention. How many sequences fold to a common structure, or equivalently, what is the sequence capacity (or designability) of a known fold? Past theoretical and computational studies primarily are focused on the thermal stability of the proteins. The stability is estimated by an energy calculation of threaded sequences in a known structure. The theory and calculations can be divided (roughly) into two categories: (i) general theories (1-6) and exhaustive simulations of simple model systems (7-11) and (ii) accurate and detailed modeling of a few proteins (12-16). The studies of class i provide a universal view of sequence-structure matches and their variations. Investigations of class ii made specific predictions on protein folds that are straightforward to test experimentally. The function of interest, protein designability or sequence capacity, was estimated theoretically and by computations. However, neither of these calculations consider explicitly all structures of the Protein Data Bank (PDB) (17). Quantitative extrapolations from approximate theories, lattice models, or detailed simulations of a few proteins to other folds may not be obvious. Furthermore, collective behavior of the evolutionary process, not restricted to a single or a few proteins, may go unnoticed.Explicit calculation of sequence capacity of all protein folds is of particular interest because genomic-scale experiments are emerging, making...

show abstract

On the origin and highly likely completeness of single-domain protein structures

Cited by 182 publications

References 37 publications

Biophysics of protein evolution and evolutionary protein biophysics

Biophysics of protein evolution and evolutionary protein biophysics

Structural space of protein–protein interfaces is degenerate, close to complete, and highly connected

The network of sequence flow between protein structures

Contact Info

Product

Resources

About