Markov (state) models (MSMs) and related models of molecular kinetics have recently received a surge of interest as they can systematically reconcile simulation data from either a few long or many short simulations and allow us to analyze the essential metastable structures, thermodynamics, and kinetics of the molecular system under investigation. However, the estimation, validation, and analysis of such models is far from trivial and involves sophisticated and often numerically sensitive methods. In this work we present the open-source Python package PyEMMA ( http://pyemma.org ) that provides accurate and efficient algorithms for kinetic model construction. PyEMMA can read all common molecular dynamics data formats, helps in the selection of input features, provides easy access to dimension reduction algorithms such as principal component analysis (PCA) and time-lagged independent component analysis (TICA) and clustering algorithms such as k-means, and contains estimators for MSMs, hidden Markov models, and several other models. Systematic model validation and error calculation methods are provided. PyEMMA offers a wealth of analysis functions such that the user can conveniently compute molecular observables of interest. We have derived a systematic and accurate way to coarse-grain MSMs to few states and to illustrate the structures of the metastable states of the system. Plotting functions to produce a manuscript-ready presentation of the results are available. In this work, we demonstrate the features of the software and show new methodological concepts and results produced by PyEMMA.
On average, an approved drug today costs $2–3 billion and takes over ten years to develop 1 . In part, this is due to expensive and time-consuming wet-lab experiments, poor initial hit compounds, and the high attrition rates in the (pre-)clinical phases. Structure-based virtual screening (SBVS) has the potential to mitigate these problems. With SBVS, the quality of the hits improves with the number of compounds screened 2 . However, despite the fact that large compound databases exist, the ability to carry out large-scale SBVSs on computer clusters in an accessible, efficient, and flexible manner has remained elusive. Here we designed VirtualFlow, a highly automated and versatile open-source platform with perfect scaling behaviour that is able to prepare and efficiently screen ultra-large ligand libraries of compounds. VirtualFlow is able to use a variety of the most powerful docking programs. Using VirtualFlow, we have prepared the largest and freely available ready-to-dock ligand library available, with over 1.4 billion commercially available molecules. To demonstrate the power of VirtualFlow, we screened over 1 billion compounds and discovered a small molecule inhibitor (iKeap1) that engages KEAP1 with nanomolar affinity ( K d = 114 nM) and disrupts the interaction between KEAP1 and the transcription factor NRF2. We also identified a set of structurally diverse molecules that bind to KEAP1 with submicromolar affinity. This illustrates the potential of VirtualFlow to access vast regions of the chemical space and identify binders with high affinity for target proteins.
The modeling of atomistic biomolecular simulations using kinetic models such as Markov state models (MSMs) has had many notable algorithmic advances in recent years. The variational principle has opened the door for a nearly fully automated toolkit for selecting models that predict the long-timescale kinetics from molecular dynamics simulations. However, one yet-unoptimized step of the pipeline involves choosing the features, or collective variables, from which the model should be constructed. In order to build intuitive models, these collective variables are often sought to be interpretable and familiar features, such as torsional angles or contact distances in a protein structure. However, previous approaches for evaluating the chosen features rely on constructing a full MSM, which in turn requires additional hyperparameters to be chosen, and hence leads to a computationally expensive framework. Here, we present a method to optimize the feature choice directly, without requiring the construction of the final kinetic model. We demonstrate our rigorous preprocessing algorithm on a canonical set of twelve fast-folding protein simulations, and show that our procedure leads to more efficient model selection.
Generation and analysis of time-series data is relevant to many quantitative fields ranging from economics to fluid mechanics. In the physical sciences, structures such as metastable and coherent sets, slow relaxation processes, collective variables, dominant transition pathways or manifolds and channels of probability flow can be of great importance for understanding and characterizing the kinetic, thermodynamic and mechanistic properties of the system. Deeptime is a general purpose Python library offering various tools to estimate dynamical models based on time-series data including conventional linear learning methods, such as Markov state models (MSMs), Hidden Markov Models and Koopman models, as well as kernel and deep learning approaches such as VAMPnets and deep MSMs. The library is largely compatible with scikit-learn, having a range of Estimator classes for these different models, but in contrast to scikit-learn also provides deep Model classes, e.g. in the case of an MSM, which provide a multitude of analysis methods to compute interesting thermodynamic, kinetic and dynamical quantities, such as free energies, relaxation times and transition paths. The library is designed for ease of use but also easily maintainable and extensible code. In this paper we introduce the main features and structure of the deeptime software. Deeptime can be found under https://deeptime-ml.github.io/.
The inner workings of a biological cell or a chemical reaction can be rationalized by the network of reactions, whose structure reveals the most important functional mechanisms. For complex systems, these reaction networks are not known a priori and cannot be efficiently computed with ab initio methods, therefore an important approach goal is to estimate effective reaction networks from observations, such as time series of the main species. Reaction networks estimated with standard machine learning techniques such as least-squares regression may fit the observations, but will typically contain spurious reactions. Here we extend the sparse identification of nonlinear dynamics (SINDy) method to vector-valued ansatz functions, each describing a particular reaction process. The resulting sparse tensor regression method "reactive SINDy" is able to estimate a parsimonious reaction network. We illustrate that a gene regulation network can be correctly estimated from observed time series.
Interacting-particle reaction dynamics (iPRD) combines the simulation of dynamical trajectories of interacting particles as in molecular dynamics (MD) simulations with reaction kinetics, in which particles appear, disappear, or change their type and interactions based on a set of reaction rules. This combination facilitates the simulation of reaction kinetics in crowded environments, involving complex molecular geometries such as polymers, and employing complex reaction mechanisms such as breaking and fusion of polymers. iPRD simulations are ideal to simulate the detailed spatiotemporal reaction mechanism in complex and dense environments, such as in signalling processes at cellular membranes, or in nano- to microscale chemical reactors. Here we introduce the iPRD software ReaDDy 2, which provides a Python interface in which the simulation environment, particle interactions and reaction rules can be conveniently defined and the simulation can be run, stored and analyzed. A C++ interface is available to enable deeper and more flexible interactions with the framework. The main computational work of ReaDDy 2 is done in hardware-specific simulation kernels. While the version introduced here provides single- and multi-threading CPU kernels, the architecture is ready to implement GPU and multi-node kernels. We demonstrate the efficiency and validity of ReaDDy 2 using several benchmark examples. ReaDDy 2 is available at the https://readdy.github.io/ website.
Interacting-particle reaction dynamics (iPRD) combines the simulation of dynamical trajectories of interacting particles as in molecular dynamics (MD) simulations with reaction kinetics, in which particles appear, disappear, or change their type and interactions based on a set of reaction rules. This combination facilitates the simulation of reaction kinetics in crowded environments, involving complex molecular geometries such as polymers, and employing complex reaction mechanisms such as breaking and fusion of polymers. iPRD simulations are ideal to simulate the detailed spatiotemporal reaction mechanism in complex and dense environments, such as in signalling processes at cellular membranes, or in nano- to microscale chemical reactors. Here we introduce the iPRD software ReaDDy 2, which provides a Python interface in which the simulation environment, particle interactions and reaction rules can be conveniently defined and the simulation can be run, stored and analyzed. A C++ interface is available to enable deeper and more flexible interactions with the framework. The main computational work of ReaDDy 2 is done in hardware-specific simulation kernels. While the version introduced here provides single- and multi-threading CPU kernels, the architecture is ready to implement GPU and multi-node kernels. We demonstrate the efficiency and validity of ReaDDy 2 using several benchmark examples. ReaDDy 2 is available at the https://readdy.github.io/ website.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.