SignificanceWe question a central paradigm: namely, that the protein domain is the “atomic unit” of evolution. In conflict with the current textbook view, our results unequivocally show that duplication of protein segments happens both above and below the domain level among amino acid segments of diverse lengths. Indeed, we show that significant evolutionary information is lost when the protein is approached as a string of domains. Our finer-grained approach reveals a far more complicated picture, where reused segments often intertwine and overlap with each other. Our results are consistent with a recursive model of evolution, in which segments of various lengths, typically smaller than domains, “hop” between environments. The fit segments remain, leaving traces that can still be detected.
The vast majority of theoretically possible polypeptide chains do not fold, let alone confer function. Hence, protein evolution from preexisting building blocks has clear potential advantages over ab initio emergence from random sequences. In support of this view, sequence similarities between different proteins is generally indicative of common ancestry, and we collectively refer to such homologous sequences as ‘themes’. At the domain level, sequence homology is routinely detected. However, short themes which are segments, or fragments of intact domains, are particularly interesting because they may provide hints about the emergence of domains, as opposed to divergence of preexisting domains, or their mixing-and-matching to form multi-domain proteins. Here we identified 525 representative short themes, comprising 20-to-80 residues, that are unexpectedly shared between domains considered to have emerged independently. Among these ‘bridging themes’ are ones shared between the most ancient domains, e.g., Rossmann, P-loop NTPase, TIM-barrel, Flavodoxin, and Ferredoxin-like. We elaborate on several particularly interesting cases, where the bridging themes mediate ligand binding. Ligand binding may have contributed to the stability and the plasticity of these building blocks, and to their ability to invade preexisting domains or serve as starting points for completely new domains.
To explore protein space from a global perspective, we consider 9,710 SCOP (Structural Classification of Proteins) domains with up to 70% sequence identity and present all similarities among them as networks: In the "domain network," nodes represent domains, and edges connect domains that share "motifs," i.e., significantly sized segments of similar sequence and structure. We explore the dependence of the network on the thresholds that define the evolutionary relatedness of the domains. At excessively strict thresholds the network falls apart completely; for very lax thresholds, there are network paths between virtually all domains. Interestingly, at intermediate thresholds the network constitutes two regions that can be described as "continuous" versus "discrete." The continuous region comprises a large connected component, dominated by domains with alternating alpha and beta elements, and the discrete region includes the rest of the domains in isolated islands, each generally corresponding to a fold. We also construct the "motif network," in which nodes represent recurring motifs, and edges connect motifs that appear in the same domain. This network also features a large and highly connected component of motifs that originate from domains with alternating alpha/ beta elements (and some all-alpha domains), and smaller isolated islands. Indeed, the motif network suggests that nature reuses such motifs extensively. The networks suggest evolutionary paths between domains and give hints about protein evolution and the underlying biophysics. They provide natural means of organizing protein space, and could be useful for the development of strategies for protein search and design.protein cooccurrence networks | protein similarity networks H ow are proteins related to each other? Which physicochemical considerations affect protein evolution and how? A global view of the protein universe may shed light on these fundamental questions. It could also suggest new strategies for protein search and design (1-3). However, forming a global picture of the protein universe is difficult because we have to piece it together from the many local glimpses that our empirical data and computational tools provide. In other words, a global picture needs to portray the relationships among all proteins, yet we only have evidence of such relationships among several proteins, based on the similarity between their sequences, structures, and functions. The considerable size of the Protein Data Bank (4) also complicates this task.In particular, an intensely debated question is whether protein space is "discrete" or "continuous" (2, 3, 5-10). These terms are loosely defined. Discrete implies that the global picture consists of separate, island-like, structural entities. In the hierarchical protein domains Structural Classification of Proteins (SCOP) (11) these entities are termed "folds," and in the CATH database (12) they are called "topologies." Alternatively, "continuous" implies that the space between these entities is generally populated by...
There are around 100 varieties of outer membrane proteins in each Gram-negative bacteria. All of these proteins have the same fold-an up-down β-barrel. It has been suggested that all membrane β-barrels excluding lysins are homologous. Here we suggest that β-barrels of efflux pumps have converged on this fold as well. By grouping structurally solved outer membrane β-barrels (OMBBs) by sequence we find that the membrane environment may have led to convergent evolution of the barrel fold. Specifically, the lack of sequence linkage to other barrels coupled with distinctive structural differences, such as differences in strand tilt and barrel radius, suggest that the outer membrane factor of efflux pumps evolutionarily converged on the barrel. Rather than being related to other OMBBs, sequence and structural similarity in the periplasmic region of the outer membrane factor of efflux pumps suggests an evolutionary link to the periplasmic subunit of the same pump complex.
There are around 100 types of integral outer membrane proteins in each Gram negative bacteria. All of these proteins have the same fold-an up-down β-barrel. It has been suggested that all membrane β-barrels other than lysins are homologous. Here we suggest that β-barrels of efflux pumps have converged on this fold as well. By grouping structurally-solved outer membrane β-barrels (OMBBs) by sequence we find evidence that the membrane environment may have led to convergent evolution of the barrel fold. Specifically, the lack of sequence linkage to other barrels coupled with distinctive structural differences, such as differences in strand tilt and barrel radius, suggest that efflux pumps have evolutionarily converged on the barrel. Finally, we find a possible ancestor for the OMBB efflux pumps as they are related to periplasmic components of the same pumps.
It can be informative to view biological data, e.g., protein-protein interactions within a large complex, in a network representation coupled with three-dimensional structural visualizations of individual molecular entities. CyToStruct, introduced here, provides a transparent interface between the Cytoscape platform for network analysis and molecular viewers, including PyMOL, UCSF Chimera, VMD, and Jmol. CyToStruct launches and passes scripts to molecular viewers from the network's edges and nodes. We provide demonstrations to analyze interactions among subunits in large protein/RNA/DNA complexes, and similarities among proteins. CyToStruct enriches the network tools of Cytoscape by adding a layer of structural analysis, offering all capabilities implemented in molecular viewers. CyToStruct is available at https://bitbucket.org/sergeyn/cytostruct/wiki/Home and in the Cytoscape App Store. Given the coordinates of a molecular complex, our web server (http://trachel-srv.cs.haifa.ac.il/rachel/ppi/) automatically generates all files needed to visualize the complex as a Cytoscape network with CyToStruct bridging to PyMOL, UCSF Chimera, VMD, and Jmol.
Protein function involves conformational changes, but often, for a given protein, only some of these conformations are known. The missing conformations could be predicted using the wealth of data in the PDB. Most PDB proteins have multiple structures, and proteins sharing one similar conformation often share others as well. The ConTemplate web server (http://bental.tau.ac.il/contemplate) exploits these observations to suggest conformations for a query protein with at least one known conformation (or model thereof). We demonstrate ConTemplate on a ribose-binding protein that undergoes significant conformational changes upon substrate binding. Querying ConTemplate with the ligand-free (or bound) structure of the protein produces the ligand-bound (or free) conformation with a root-mean-square deviation of 1.7 Å (or 2.2 Å); the models are derived from conformations of other sugar-binding proteins, sharing approximately 30% sequence identity with the query. The calculation also suggests intermediate conformations and a pathway between the bound and free conformations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.