Summary Over the last ten years the number of cryoelectron microscopy (cryoEM) experiments yielding medium resolution (7–10 Å) density maps of proteins has greatly increased. At this resolution α-helices can be identified as density rods while β-strand or loop regions are not as easily discerned. Thus, for mostly α-helical proteins the general arrangement of secondary structure elements in space is revealed while their connectivity remains unknown. We are proposing a novel computational protein structure prediction algorithm “EM-Fold” that resolves the connectivity ambiguity by placing predicted α-helices into the density rods, adds missing backbone coordinates in loop regions, and finally builds all-atom models by constructing side chain coordinates. In a benchmark of ten mainly α-helical proteins of known structure a native-like model is identified in seven cases (RMSD 3.9 to 7.1 Å). The three failures can be attributed to inaccuracies in the secondary structure prediction step that precedes EM-Fold. EM-Fold has been applied to the ~6 Å resolution cryoEM density map of protein IIIa from human adenovirus. This predominantly α-helical capsid protein is involved in viral assembly, maturation, and cell entry. We report the first topological model for the α-helical 400 residue N-terminal region of protein IIIa showing interactions with neighboring capsid proteins. Beyond its importance in cryoEM, EM-Fold has the potential to interpret medium resolution density maps in X-ray crystallography.
Electron density maps of membrane proteins or large macromolecular complexes are frequently only determined at medium resolution between 4 Å and 10 Å, either by cryo-electron microscopy (cryoEM) or X-ray crystallography. In these density maps the general arrangement of secondary structure elements is revealed while their directionality and connectivity remain elusive. We demonstrate that the topology of proteins with up to 250 amino acids can be determined from such density maps when combined with a computational protein folding protocol. Furthermore, we accurately reconstruct atomic detail in loop regions and amino acid side chains not visible in the experimental data. The EM-Fold algorithm assembles the secondary structure elements de novo before atomic detail is added using Rosetta. In a benchmark of 27 proteins the protocol consistently and reproducibly achieves models with RMSD values smaller than 3 Å.
EM-Fold was used to build models for nine proteins in the maps of GroEL (7.7 Å resolution) and ribosome (6.4 Å resolution) in the ab initio modeling category of the 2010 cryoEM modeling challenge. EM-Fold assembles predicted secondary structure elements (SSEs) into regions of the density map that were identified to correspond to either α-helices or β-strands. The assembly uses a Monte Carlo algorithm where loop closure, density-SSE length agreement, and strength of connecting density between SSEs are evaluated. Top scoring models are refined by translating, rotating and bending SSEs to yield better agreement with the density map. EM-Fold produces models that contain backbone atoms within secondary structure elements only. The RMSD values of the models with respect to native range from 2.4 Å to 3.5 Å for six of the nine proteins. EM-Fold failed to predict the correct topology in three cases. Subsequently Rosetta was used to build loops and side chains for the very best scoring models after EM-Fold refinement. The refinement within Rosetta’s force field is driven by a density agreement score that calculates a cross correlation between a density map simulated from the model and the experimental density map. All-atom RMSDs as low as 3.4 Å are achieved in favorable cases. Values above 10.0 Å are observed for two proteins with low overall content of secondary structure and hence particularly complex loop modeling problems. RMSDs over residues in secondary structure elements range from 2.5 Å to 4.8 Å.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.