1 Cryo-EM is a powerful method for determining biomolecular structures. But, unlike Xray crystallography or solution-state NMR, which are data-rich, cryo-EM can be data-poor.Cryo-EM routinely gives electron density information to about 3-5Å and the resolution often varies across the structure. So, it has been challenging to develop an automated computer algorithm that converts the experimental density maps to complete molecular structures. We address this challenge with CryoFold, a computational method that finds the chain trace from the density maps using MAINMAST, then performs molecular dynamics simulations using ReMDFF, a resolution-exchange flexible fitting protocol, accelerated by MELD, which uses low-information data to broaden the relevant conformational searching of secondary and tertiary structures. We describe four successes of structure determinations, including for membrane proteins and large molecules. CryoFold handles input data that is heterogeneous, and even sparse. The software is automated, and is available to the public via a python-based graphical user interface.
IntroductionCryo-electron microscopy (cryo-EM) has emerged to be one of the most successful methods for determining the structures of proteins and other biomolecules. It has produced more than 8000 structures in less than two decades. Cryo-EM serves a niche -such as large complexes or membrane proteins or molecules that are not easily crystallizable -that traditional methods, such as X-ray diffraction, electron or neutron scattering, or NMR often cannot handle.Similar to X-ray crystallography, cryo-EM data produces electron density maps. To deter-2 mine molecular structures from these maps requires automated computer algorithms. Development of such methods however, has been a major bottleneck. Unlike X-ray crystallography, which is normally high-resolution (data rich), cryo-EM is often mid-resolution (data poor). As a consequence, application of the popular X-ray refinement protocol PHENIX to cryo-EM data enables models that are between 47-71% complete 1 . Another popular de-novo refinement method, Rosetta, builds an initial model by assembling fragment structures, and then optimizes this model in all-atom details by fitting to an EM map. While it is generally more challenging for the EM variants of Rosetta to automatically fold β-sheets into electron density maps 2 , it also requires that at least 70% of the Cα atoms be placed correctly 3-5 . Molecular Dynamics (MD) simulations using an atomistic force field 6, 7 ensure that structures are consistent with physical forces, but MD is computationally expensive and can determine wrong structures that are not native [8][9][10] . Therefore, MD is sometimes augmented with external information such as evolutionary covariance 11, 12 or homology-based starting models 13, 14 . Nonetheless, these additions can introduce new discrepancies that are refractory to automated fixes 5, 15 .Here, we describe CryoFold, an atomistic-physical algorithm that derives protein structures from cryo-EM data. ...