The burial of hydrophobic amino acids in the protein core is a driving force in protein folding. The extent to which an amino acid interacts with the solvent and the protein core is naturally proportional to the surface area exposed to these environments. However, an accurate calculation of the solvent-accessible surface area (SASA), a geometric measure of this exposure, is numerically demanding as it is not pair-wise decomposable. Furthermore, it depends on a full-atom representation of the molecule. This manuscript introduces a series of four SASA approximations of increasing computational complexity and accuracy as well as knowledge-based environment free energy potentials based on these SASA approximations. Their ability to distinguish correctly from incorrectly folded protein models is assessed to balance speed and accuracy for protein structure prediction. We find the newly developed “Neighbor Vector” algorithm provides the most optimal balance of accurate yet rapid exposure measures.
Summary Over the last ten years the number of cryoelectron microscopy (cryoEM) experiments yielding medium resolution (7–10 Å) density maps of proteins has greatly increased. At this resolution α-helices can be identified as density rods while β-strand or loop regions are not as easily discerned. Thus, for mostly α-helical proteins the general arrangement of secondary structure elements in space is revealed while their connectivity remains unknown. We are proposing a novel computational protein structure prediction algorithm “EM-Fold” that resolves the connectivity ambiguity by placing predicted α-helices into the density rods, adds missing backbone coordinates in loop regions, and finally builds all-atom models by constructing side chain coordinates. In a benchmark of ten mainly α-helical proteins of known structure a native-like model is identified in seven cases (RMSD 3.9 to 7.1 Å). The three failures can be attributed to inaccuracies in the secondary structure prediction step that precedes EM-Fold. EM-Fold has been applied to the ~6 Å resolution cryoEM density map of protein IIIa from human adenovirus. This predominantly α-helical capsid protein is involved in viral assembly, maturation, and cell entry. We report the first topological model for the α-helical 400 residue N-terminal region of protein IIIa showing interactions with neighboring capsid proteins. Beyond its importance in cryoEM, EM-Fold has the potential to interpret medium resolution density maps in X-ray crystallography.
X-ray crystal structures have revealed that numerous secondary transporter proteins originally categorized into different sequence families share similar structures, namely, the LeuT fold. The core of this fold consists of two units of five transmembrane helices, whose conformations have been proposed to exchange to form the two alternate states required for transport. That these two units are related implies that LeuT-like transporters evolved from gene-duplication and fusion events. Thus, the origins of this structural repeat may be relevant to the evolution of transport function. However, the lack of significant sequence similarity requires sensitive sequence search methods for analyzing their evolution. To this end, we developed a software application called AlignMe, which can use various types of input information, such as residue hydrophobicity, to perform pairwise alignments of sequences and/or of hydropathy profiles of (membrane) proteins. We used AlignMe to analyze the evolutionary relationships between repeats of the LeuT fold. In addition, we identified proteins from the so-called DedA family that potentially share a common ancestor with these repeats. DedA domains have been implicated in, e.g., selenite uptake; they are found widely distributed across all kingdoms of life; two or more DedA domains are typically found per genome, and some may adopt dual topologies. These results suggest that DedA proteins existed in ancient organisms and may function as dimers, as required for a would-be ancestor of the LeuT fold. In conclusion, we provide novel insights into the evolution of this important structural motif and thus potentially into the alternating-access mechanism of transport itself.
The topology of most experimentally determined protein domains is defined by the relative arrangement of secondary structure elements, i.e. α-helices and β-strands, which make up 50–70% of the sequence. Pairing of β-strands defines the topology of β-sheets. The packing of side chains between α-helices and β-sheets defines the majority of the protein core. Often, limited experimental datasets restrain the position of secondary structure elements while lacking detail with respect to loop or side chain conformation. At the same time the regular structure and reduced flexibility of secondary structure elements make these interactions more predictable when compared to flexible loops and side chains. To determine the topology of the protein in such settings, we introduce a tailored knowledge-based energy function that evaluates arrangement of secondary structure elements only. Based on the amino acid Cβ atom coordinates within secondary structure elements, potentials for amino acid pair distance, amino acid environment, secondary structure element packing, β-strand pairing, loop length, radius of gyration, contact order and secondary structure prediction agreement are defined. Separate penalty functions exclude conformations with clashes between amino acids or secondary structure elements and loops that cannot be closed. Each individual term discriminates for native-like protein structures. The composite potential significantly enriches for native-like models in three different databases of 10,000–12,000 protein models in 80–94% of the cases. The corresponding application, “BCL::ScoreProtein,” is available at www.meilerlab.org.
Computational de novo protein structure prediction is limited to small proteins of simple topology. The present work explores an approach to extend beyond the current limitations through assembling protein topologies from idealized α-helices and β-strands. The algorithm performs a Monte Carlo Metropolis simulated annealing folding simulation. It optimizes a knowledge-based potential that analyzes radius of gyration, β-strand pairing, secondary structure element (SSE) packing, amino acid pair distance, amino acid environment, contact order, secondary structure prediction agreement and loop closure. Discontinuation of the protein chain favors sampling of non-local contacts and thereby creation of complex protein topologies. The folding simulation is accelerated through exclusion of flexible loop regions further reducing the size of the conformational search space. The algorithm is benchmarked on 66 proteins with lengths between 83 and 293 amino acids. For 61 out of these proteins, the best SSE-only models obtained have an RMSD100 below 8.0 Å and recover more than 20% of the native contacts. The algorithm assembles protein topologies with up to 215 residues and a relative contact order of 0.46. The method is tailored to be used in conjunction with low-resolution or sparse experimental data sets which often provide restraints for regions of defined secondary structure.
The concept of hydrophobicity is critical to our understanding of the principles of membrane protein folding, structure, and function. In the last decades several groups have derived hydrophobicity scales using both experimental and statistical methods that are optimized to mimic certain natural phenomena as closely as possible. The present work adds to this toolset the first knowledge-based scale that unifies the characteristics of both, α-helical and β-barrel multi-span membrane proteins. This Unified Hydrophobicity Scale (UHS) distinguishes between amino acid preference for solution, transition, and trans-membrane states. The scale represents average hydrophobicity values of amino acids in folded proteins, irrespective of their secondary structure type. We furthermore present the first knowledge-based hydrophobicity scale for mammalian α-helical MPs (Mammalian Hydrophobicity Scale -MHS). Both scales are particularly useful for computational protein structure elucidation, for example as input for machine learning techniques, such as secondary structure or trans-membrane span prediction, or as reference energies for protein structure prediction or protein design. The knowledge-based UHS shows a striking similarity to a recent experimental hydrophobicity scale introduced by Hessa and co-workers. Convergence of two very different approaches onto similar hydrophobicity values consolidates the major differences between experimental and knowledge-based scales observed in earlier studies. Moreover, the UHS scale represents an accurate absolute free energy measure for folded, multispan membrane proteins -a feature that is absent from many existing scales. The utility of the UHS was demonstrated by analyzing a series of diverse MPs. It is further shown that the UHS outperforms nine established hydrophobicity scales in predicting trans-membrane spans along the protein sequence. The accuracy of the present hydrophobicity scale profits from the doubling of the number of integral membrane proteins in the PDB over the past years. The UHS paves the way for an increased accuracy in the prediction of trans-membrane spans.
Few sequence alignment methods have been designed specifically for integral membrane proteins, even though these important proteins have distinct evolutionary and structural properties that might affect their alignments. Existing approaches typically consider membrane-related information either by using membrane-specific substitution matrices or by assigning distinct penalties for gap creation in transmembrane and non-transmembrane regions. Here, we ask whether favoring matching of predicted transmembrane segments within a standard dynamic programming algorithm can improve the accuracy of pairwise membrane protein sequence alignments. We tested various strategies using a specifically designed program called AlignMe. An updated set of homologous membrane protein structures, called HOMEP2, was used as a reference for optimizing the gap penalties. The best of the membrane-protein optimized approaches were then tested on an independent reference set of membrane protein sequence alignments from the BAliBASE collection. When secondary structure (S) matching was combined with evolutionary information (using a position-specific substitution matrix (P)), in an approach we called AlignMePS, the resultant pairwise alignments were typically among the most accurate over a broad range of sequence similarities when compared to available methods. Matching transmembrane predictions (T), in addition to evolutionary information, and secondary-structure predictions, in an approach called AlignMePST, generally reduces the accuracy of the alignments of closely-related proteins in the BAliBASE set relative to AlignMePS, but may be useful in cases of extremely distantly related proteins for which sequence information is less informative. The open source AlignMe code is available at https://sourceforge.net/projects/alignme/, and at http://www.forrestlab.org, along with an online server and the HOMEP2 data set.
We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.