Recently, Machine Learning (ML) has proven to yield fast and accurate predictions of chemical properties to accelerate the discovery of novel molecules and materials. The majority of the work is...
Computational methods and recently modern machine learning methods have played a key role in structure-based drug design. Though several benchmarking datasets are available for machine learning applications in virtual screening, accurate prediction of binding affinity for a protein-ligand complex remains a major challenge. New datasets that allow for the development of models for predicting binding affinities better than the state-of-the-art scoring functions are important. For the first time, we have developed a dataset, PLAS-5k comprised of 5000 protein-ligand complexes chosen from PDB database. The dataset consists of binding affinities along with energy components like electrostatic, van der Waals, polar and non-polar solvation energy calculated from molecular dynamics simulations using MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) method. The calculated binding affinities outperformed docking scores and showed a good correlation with the available experimental values. The availability of energy components may enable optimization of desired components during machine learning-based drug design. Further, OnionNet model has been retrained on PLAS-5k dataset and is provided as a baseline for the prediction of binding affinities.
The generation of low-energy 3D structures of metal clusters depends on the efficiency of the search algorithm and the accuracy of inter-atomic interaction description. In this work, we formulate the search algorithm as a Reinforcement Learning (RL) problem. Concisely, we propose a novel actor-critic architecture that generates low-lying isomers of metal clusters at a fraction of computational cost than conventional methods. Our RL-based search algorithm uses a previously developed DART model as a reward function to describe the inter-atomic interactions to validate predicted structures. Using the DART model as a reward function incentivizes the RL model to generate low-energy structures and helps generate valid structures. We demonstrate the advantages of our approach over conventional methods for scanning local minima on potential energy surface (PES). Our approach not only generates isomer of gallium clusters at a minimal computational cost but also predicts isomer families that were not discovered through previous DFT-based approaches.
There has been tremendous advancement in machine learning (ML) applications in computational chemistry, particularly in neural network potentials (NNP). NNPs can approximate potential energy surface (PES) as a high dimensional function by learning from existing reference data, thereby circumventing the need to solve the electronic Schrödinger equation explicitly. As a result, ML accelerates chemical space exploration and property prediction compared to quantum mechanical methods. Novel ML methods have the potential to provide efficient means for predicting the properties of molecules. However, this potential has been limited by the lack of standard comparative evaluations. In this work, we compare four selected models, that is, ANI, PhysNet, SchNet, and BAND‐NN, developed to represent the PES of small organic molecules. We evaluate these models for their accuracy and transferability on two different test sets (i) Small organic molecules of up to eight‐heavy atoms on which ANI and SchNet achieve root mean square error (RMSE) of 0.55 and 0.60 kcal/mol, respectively. (ii) On random selection of molecules from the GDB‐11 database with 10‐heavy atoms, ANI achieves RMSE of 1.17 kcal/mol and SchNet achieves RMSE of 1.89 kcal/mol. We examine their ability to produce smooth meaningful surface by performing PES scans for bond stretch, angle bend, and dihedral rotations on relatively large molecules to assess their possible application in molecular dynamics simulations. We also evaluate their performance for yielding minimum energy structures via geometry optimization using various minimization algorithms. All these models were also able to accurately differentiate different isomers of the same empirical formula C10H20. ANI and PhysNet achieve an RMSE of 0.29 and 0.52 kcal/mol, respectively, on C10H20 isomers.
<div><div><div><p>Recently, Machine Learning (ML) has proven to yield fast and accurate predictions of chemical properties to accelerate the discovery of novel molecules and materials. The majority of the work is on organic molecules, and much more work needs to be done for inorganic molecules, especially clusters. In the present work, we introduce a simple Topological Atomic Descriptor called TAD, which encodes chemical environment information of each atom in the cluster. TAD is a simple and interpretable descriptor where each value represents the atom count in three shells. We also introduce the DART, Deep Learning Enabled Topological Interaction model, which uses TAD as a feature vector to predict energies of metal clusters, in our case Gallium clusters with size ranging from 31 to 70 atoms. DART model is designed based on the principle that energy is a function of atomic interactions and allows us to model these complex atomic interactions to predict the energy. We further introduce a new dataset called GNC_31-70, which comprises structures and DFT optimized energies of Gallium clusters with sizes ranging from 31 to 70 atoms. We show how DART can be used to accelerate the identification of ground-state structures without geometry optimization. Albeit using topological descriptor, DART achieves MAE of 3.59 kcal/mol (0.15 eV) on testset. We also show that our model can distinguish core and surface atoms in the Ga-70 cluster, which the model has never encountered earlier. Finally, we demonstrate the transferability of DART model by predicting energies for about 6k unseen configurations picked up from Molecular Dynamics (MD) data for three cluster sizes (46, 57, and 60) within seconds. The DART model was able to reduce the load on DFT optimizations while identifying unique low energy structures from MD data.</p></div></div></div>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.