The burial of hydrophobic amino acids in the protein core is a driving force in protein folding. The extent to which an amino acid interacts with the solvent and the protein core is naturally proportional to the surface area exposed to these environments. However, an accurate calculation of the solvent-accessible surface area (SASA), a geometric measure of this exposure, is numerically demanding as it is not pair-wise decomposable. Furthermore, it depends on a full-atom representation of the molecule. This manuscript introduces a series of four SASA approximations of increasing computational complexity and accuracy as well as knowledge-based environment free energy potentials based on these SASA approximations. Their ability to distinguish correctly from incorrectly folded protein models is assessed to balance speed and accuracy for protein structure prediction. We find the newly developed “Neighbor Vector” algorithm provides the most optimal balance of accurate yet rapid exposure measures.
Summary Over the last ten years the number of cryoelectron microscopy (cryoEM) experiments yielding medium resolution (7–10 Å) density maps of proteins has greatly increased. At this resolution α-helices can be identified as density rods while β-strand or loop regions are not as easily discerned. Thus, for mostly α-helical proteins the general arrangement of secondary structure elements in space is revealed while their connectivity remains unknown. We are proposing a novel computational protein structure prediction algorithm “EM-Fold” that resolves the connectivity ambiguity by placing predicted α-helices into the density rods, adds missing backbone coordinates in loop regions, and finally builds all-atom models by constructing side chain coordinates. In a benchmark of ten mainly α-helical proteins of known structure a native-like model is identified in seven cases (RMSD 3.9 to 7.1 Å). The three failures can be attributed to inaccuracies in the secondary structure prediction step that precedes EM-Fold. EM-Fold has been applied to the ~6 Å resolution cryoEM density map of protein IIIa from human adenovirus. This predominantly α-helical capsid protein is involved in viral assembly, maturation, and cell entry. We report the first topological model for the α-helical 400 residue N-terminal region of protein IIIa showing interactions with neighboring capsid proteins. Beyond its importance in cryoEM, EM-Fold has the potential to interpret medium resolution density maps in X-ray crystallography.
X-ray crystal structures have revealed that numerous secondary transporter proteins originally categorized into different sequence families share similar structures, namely, the LeuT fold. The core of this fold consists of two units of five transmembrane helices, whose conformations have been proposed to exchange to form the two alternate states required for transport. That these two units are related implies that LeuT-like transporters evolved from gene-duplication and fusion events. Thus, the origins of this structural repeat may be relevant to the evolution of transport function. However, the lack of significant sequence similarity requires sensitive sequence search methods for analyzing their evolution. To this end, we developed a software application called AlignMe, which can use various types of input information, such as residue hydrophobicity, to perform pairwise alignments of sequences and/or of hydropathy profiles of (membrane) proteins. We used AlignMe to analyze the evolutionary relationships between repeats of the LeuT fold. In addition, we identified proteins from the so-called DedA family that potentially share a common ancestor with these repeats. DedA domains have been implicated in, e.g., selenite uptake; they are found widely distributed across all kingdoms of life; two or more DedA domains are typically found per genome, and some may adopt dual topologies. These results suggest that DedA proteins existed in ancient organisms and may function as dimers, as required for a would-be ancestor of the LeuT fold. In conclusion, we provide novel insights into the evolution of this important structural motif and thus potentially into the alternating-access mechanism of transport itself.
The topology of most experimentally determined protein domains is defined by the relative arrangement of secondary structure elements, i.e. α-helices and β-strands, which make up 50–70% of the sequence. Pairing of β-strands defines the topology of β-sheets. The packing of side chains between α-helices and β-sheets defines the majority of the protein core. Often, limited experimental datasets restrain the position of secondary structure elements while lacking detail with respect to loop or side chain conformation. At the same time the regular structure and reduced flexibility of secondary structure elements make these interactions more predictable when compared to flexible loops and side chains. To determine the topology of the protein in such settings, we introduce a tailored knowledge-based energy function that evaluates arrangement of secondary structure elements only. Based on the amino acid Cβ atom coordinates within secondary structure elements, potentials for amino acid pair distance, amino acid environment, secondary structure element packing, β-strand pairing, loop length, radius of gyration, contact order and secondary structure prediction agreement are defined. Separate penalty functions exclude conformations with clashes between amino acids or secondary structure elements and loops that cannot be closed. Each individual term discriminates for native-like protein structures. The composite potential significantly enriches for native-like models in three different databases of 10,000–12,000 protein models in 80–94% of the cases. The corresponding application, “BCL::ScoreProtein,” is available at www.meilerlab.org.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.