Conventional molecular dynamics simulations are incapable of sampling many important interactions in biomolecular systems due to their high dimensionality and rough energy landscapes. To observe rare events and calculate transition rates in these systems, enhanced sampling is a necessity. In particular, the study of ligand-protein interactions necessitates a diverse ensemble of protein conformations and transition states, and for many systems, this occurs on prohibitively long time scales. Previous strategies such as WExplore that can be used to determine these types of ensembles are hindered by problems related to the regioning of conformational space. Here, we propose a novel, regionless, enhanced sampling method that is based on the weighted ensemble framework. In this method, a value referred to as “trajectory variation” is optimized after each cycle through cloning and merging operations. This method allows for a more consistent measurement of observables and broader sampling resulting in the efficient exploration of previously unexplored conformations. We demonstrate the performance of this algorithm with the N-dimensional random walk and the unbinding of the trypsin-benzamidine system. The system is analyzed using conformation space networks, the residence time of benzamidine is confirmed, and a new unbinding pathway for the trypsin-benzamidine system is found. We expect that resampling of ensembles by variation optimization will be a useful general tool to broadly explore free energy landscapes.
This work examines methods for predicting the partition coecient (log P) for a dataset of small molecules.Here, we use atomic attributes such as radius and partial charge, which are typically used as forcefield parameters in classical molecular dynamics simulations. These atomic features are transformed into index-invariant molecular features using a recently developed method called Geometric Scattering for Graphs (GSG). We call this approach "ClassicalGSG" and examine its performance under a broad range of conditions and hyperparameters. We train a ClassicalGSG log P predictor with neural networks using 10722 molecules from the ChEMBL21 dataset and apply it to predict the log P values from four independent test sets. The ClassicalGSGmethod's performance is compared to a baseline model that employs graph convolutional neural networks (GCNNs). Our results show that the best prediction accuracies are obtained using atomic attributes generated with the CHARMM generalized Force Field (CGenFF) and 2D molecular structures. File list (2)download file view on ChemRxiv Donyapour2020_logP.pdf (1.37 MiB) download file view on ChemRxiv logP_SI.pdf (166.40 KiB)
Conventional molecular dynamics simulations are incapable of sampling many important interactions in biomolecular systems due to their high dimensionality and rough energy landscapes. To observe rare events and calculate transition rates in these systems, enhanced sampling is a necessity. In particular, the study of ligand-protein interactions necessitates a diverse ensemble of protein conformations and transition states, and for many systems this occurs on prohibitively long timescales. Previous strategies such as WExplore that can be used to determine these types of ensembles are hindered by problems related to the regioning of conformational space. Here we propose a novel, regionless, enhanced sampling method that is based on the weighted ensemble framework. In this method, a value referred to as “trajectory variation” is optimized after each cycle through cloning and merging operations. This method allows for a more consistent measurement of observables and broader sampling resulting in the efficient exploration of previously unexplored conformations. We demonstrate the performance of this algorithm with the N-dimensional random walk and the unbinding of the trypsin-benzamidine system. The system is analyzed using conformation space networks, the residence time of benzamidine is confirmed, and a new unbinding pathway for the trypsin-benzamidine system is found. We expect that REVO will be a useful general tool to broadly explore free energy landscapes.
The prediction of log P values is one part of the statistical assessment of the modeling of proteins and ligands (SAMPL) blind challenges. Here, we use a molecular graph representation method called Geometric Scattering for Graphs (GSG) to transform atomic attributes to molecular features. The atomic attributes used here are parameters from classical molecular force fields including partial charges and Lennard-Jones interaction parameters. The molecular features from GSG are used as inputs to neural networks that are trained using a “master” dataset comprised of over 41, 000 unique log P values. The specific molecular targets in the SAMPL7 log P prediction challenge were unique in that they all contained a sulfonyl moeity. This motivated a set of ClassicalGSG submissions where predictors were trained on different subsets of the master dataset that are filtered according to chemical types and/or the presence of the sulfonyl moeity. We find that our ranked prediction obtained 5th place with an RMSE of 0.77 log P units and an MAE of 0.62, while one of our non-ranked predictions achieved first place among all submissions with an RMSE of 0.55 and an MAE of 0.44. After the conclusion of the challenge we also examined the performance of open-source force field parameters that allow for an end-to-end log P predictor model: General AMBER Force Field (GAFF), Universal Force Field (UFF), Merck Molecular Force Field 94 (MMFF94) and Ghemical. We find that ClassicalGSG models trained with atomic attributes from MMFF94 can yield more accurate predictions compared to those trained with CGenFF atomic attributes.
Conventional molecular dynamics simulations are incapable of sampling many important interactions in biomolecular systems due to their high dimensionality and rough energy landscapes. To observe rare events and calculate transition rates in these systems, enhanced sampling is a necessity. In particular, the study of ligand-protein interactions necessitates a diverse ensemble of protein conformations and transition states, and for many systems this occurs on prohibitively long timescales. Previous strategies such as WExplore that can be used to determine these types of ensembles are hindered by problems related to the regioning of conformational space. Here we propose a novel, regionless, enhanced sampling method that is based on the weighted ensemble framework. In this method, a value referred to as “trajectory variation” is optimized after each cycle through cloning and merging operations. This method allows for a more consistent measurement of observables and broader sampling resulting in the efficient exploration of previously unexplored conformations. We demonstrate the performance of this algorithm with the N-dimensional random walk and the unbinding of the trypsin-benzamidine system. The system is analyzed using conformation space networks, the residence time of benzamidine is confirmed, and a new unbinding pathway for the trypsin-benzamidine system is found. We expect that REVO will be a useful general tool to broadly explore free energy landscapes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.