Machine learning
methods provide a general framework for automatically
finding and representing the essential characteristics of simulation
data. This task is particularly crucial in enhanced sampling simulations.
There we seek a few generalized degrees of freedom, referred to as
collective variables (CVs), to represent and drive the sampling of
the free energy landscape.
In theory, these CVs should separate different metastable states and
correspond to the slow degrees of freedom of the studied physical
process. To this aim, we propose a new method that we call multiscale
reweighted stochastic embedding (MRSE). Our work builds upon a parametric
version of stochastic neighbor embedding. The technique automatically
learns CVs that map a high-dimensional feature space to a low-dimensional
latent space via a deep neural network. We introduce several new advancements
to stochastic neighbor embedding methods that make MRSE especially
suitable for enhanced sampling simulations: (1) weight-tempered random
sampling as a landmark selection scheme to obtain training data sets
that strike a balance between equilibrium representation and capturing
important metastable states lying higher in free energy; (2) a multiscale
representation of the high-dimensional feature space via a Gaussian
mixture probability model; and (3) a reweighting procedure to account
for training data from a biased probability distribution. We show
that MRSE constructs low-dimensional CVs that can correctly characterize
the different metastable states in three model systems: the Müller-Brown
potential, alanine dipeptide, and alanine tetrapeptide.
In this work we propose an application of a nonlinear dimensionality reduction method to represent the high-dimensional configuration space of the ligand-protein dissociation process in a manner facilitating interpretation. Rugged ligand expulsion paths are mapped into 2-dimensional space. The mapping retains the main structural changes occurring during the dissociation. The topological similarity of the reduced paths may be easily studied using the Fréchet distances, and we show that this measure facilitates machine learning classification of the diffusion pathways. Further, low-dimensional configuration space allows for identification of residues active in transport during the ligand diffusion from a protein. The utility of this approach is illustrated by examination of the configuration space of cytochrome P450cam involved in expulsing camphor by means of enhanced all-atom molecular dynamics simulations. The expulsion trajectories are sampled and constructed on-the-fly during molecular dynamics simulations using the recently developed memetic algorithms [ Rydzewski, J.; Nowak, W. J. Chem. Phys. 2015 , 143 ( 12 ), 124101 ]. We show that the memetic algorithms are effective for enforcing the ligand diffusion and cavity exploration in the P450cam-camphor complex. Furthermore, we demonstrate that machine learning techniques are helpful in inspecting ligand diffusion landscapes and provide useful tools to examine structural changes accompanying rare events.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.