Significant efforts in wet and dry laboratories are devoted to resolving molecular structures. In particular, computational methods can now compute thousands of tertiary structures that populate the structure space of a protein molecule of interest. These advances are now allowing us to turn our attention to analysis methodologies that are able to organize the computed structures in order to highlight functionally relevant structural states. In this paper, we propose a methodology that leverages community detection methods, designed originally to detect communities in social networks, to organize computationally probed protein structure spaces. We report a principled comparison of such methods along several metrics on proteins of diverse folds and lengths. We present a rigorous evaluation in the context of decoy selection in template-free protein structure prediction. The results make the case that network-based community detection methods warrant further investigation to advance analysis of protein structure spaces for automated selection of functionally relevant structures.
Molecular dynamics (MD) simulation software allows probing the equilibrium structural dynamics of a molecule of interest, revealing how a molecule navigates its structure space one structure at a time. To obtain a broader view of dynamics, typically one needs to launch many such simulations, obtaining many trajectories. A summarization of the equilibrium dynamics requires integrating the information in the various trajectories, and Markov State Models (MSM) are increasingly being used for this task. At its core, the task involves organizing the structures accessed in simulation into structural states, and then constructing a transition probability matrix revealing the transitions between states. While now considered a mature technology and widely used to summarize equilibrium dynamics, the underlying computational process in the construction of an MSM ignores energetics even though the transition of a molecule between two nearby structures in an MD trajectory is governed by the corresponding energies. In this paper, we connect theory with simulation and analysis of equilibrium dynamics. A molecule navigates the energy landscape underlying the structure space. The structural states that are identified via off-the-shelf clustering algorithms need to be connected to thermodynamically-stable and semi-stable (macro)states among which transitions can then be quantified. Leveraging recent developments in the analysis of energy landscapes that identify basins in the landscape, we evaluate the hypothesis that basins, directly tied to stable and semi-stable states, lead to better models of dynamics. Our analysis indicates that basins lead to MSMs of better quality and thus can be useful to further advance this widely-used technology for summarization of molecular equilibrium dynamics.
The energy landscape that organizes microstates of a molecular system and governs the underlying molecular dynamics exposes the relationship between molecular form/structure, changes to form, and biological activity or function in the cell. However, several challenges stand in the way of leveraging energy landscapes for relating structure and structural dynamics to function. Energy landscapes are high-dimensional, multi-modal, and often overly-rugged. Deep wells or basins in them do not always correspond to stable structural states but are instead the result of inherent inaccuracies in semi-empirical molecular energy functions. Due to these challenges, energetics is typically ignored in computational approaches addressing long-standing central questions in computational biology, such as protein decoy selection. In the latter, the goal is to determine over a possibly large number of computationally-generated three-dimensional structures of a protein those structures that are biologically-active/native. In recent work, we have recast our attention on the protein energy landscape and its role in helping us to advance decoy selection. Here, we summarize some of our successes so far in this direction via unsupervised learning. More importantly, we further advance the argument that the energy landscape holds valuable information to aid and advance the state of protein decoy selection via novel machine learning methodologies that leverage supervised learning. Our focus in this article is on decoy selection for the purpose of a rigorous, quantitative evaluation of how leveraging protein energy landscapes advances an important problem in protein modeling. However, the ideas and concepts presented here are generally useful to make discoveries in studies aiming to relate molecular structure and structural dynamics to function.
Molecular dynamics simulation software now provides us with a view of the structure space accessed by a molecule. Increasingly, Markov state models are proposed to integrate various simulations of a molecule and extract its equilibrium structural dynamics. The approach relies on organizing the structures accessed in simulation into states as an attempt to identify thermodynamically-stable and semi-stable (macro)states among which transitions can then be quantified. Typically, off-the-shelf clustering algorithms are used for this purpose. In this paper, we investigate two additional complementary approaches to state identification that rely on graph embeddings of the structures. In particular, we show that doing so allows revealing basins in the energy landscape associated with the accessed structure space. Moreover, we demonstrate that basins, directly tied to stable and semi-stable states, yield to a better model of dynamics on a proof-of-concept application.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.