Conspectus Machine learning interatomic potentials (MLIPs) are widely used for describing molecular energy and continue bridging the speed and accuracy gap between quantum mechanical (QM) and classical approaches like force fields. In this Account, we focus on the out-of-the-box approaches to developing transferable MLIPs for diverse chemical tasks. First, we introduce the “Accurate Neural Network engine for Molecular Energies,” ANAKIN-ME, method (or ANI for short). The ANI model utilizes Justin Smith Symmetry Functions (JSSFs) and realizes training for vast data sets. The training data set of several orders of magnitude larger than before has become the key factor of the knowledge transferability and flexibility of MLIPs. As the quantity, quality, and types of interactions included in the training data set will dictate the accuracy of MLIPs, the task of proper data selection and model training could be assisted with advanced methods like active learning (AL), transfer learning (TL), and multitask learning (MTL). Next, we describe the AIMNet “Atoms-in-Molecules Network” that was inspired by the quantum theory of atoms in molecules. The AIMNet architecture lifts multiple limitations in MLIPs. It encodes long-range interactions and learnable representations of chemical elements. We also discuss the AIMNet-ME model that expands the applicability domain of AIMNet from neutral molecules toward open-shell systems. The AIMNet-ME encompasses a dependence of the potential on molecular charge and spin. It brings ML and physical models one step closer, ensuring the correct molecular energy behavior over the total molecular charge. We finally describe perhaps the simplest possible physics-aware model, which combines ML and the extended Hückel method. In ML-EHM, “Hierarchically Interacting Particle Neural Network,” HIP-NN generates the set of a molecule- and environment-dependent Hamiltonian elements αμμ and K ‡. As a test example, we show how in contrast to traditional Hückel theory, ML-EHM correctly describes orbital crossing with bond rotations. Hence it learns the underlying physics, highlighting that the inclusion of proper physical constraints and symmetries could significantly improve ML model generalization.
We report the results of the first comprehensive DFT study on the d(A)3·d(T)3 and d(G)3·d(C)3 nucleic acid duplexes. The ability of mini-helixes to preserve the conformation of B-DNA in the gas phase and under the influence of such factors as: solvent, uncompensated charge, and counter-ions was evaluated using M06-2X functional with 6-31G(d,p) basis set. The accuracy of the models was ascertained based on their ability to reproduce key structural features of natural B-DNA. Analysis of the helicity suggests that the helical conformations adopt geometrical parameters which are close to those of the B-DNA form. The torsion angles fall somewhere between the values observed for BI/BII conformational classes. The comparative analysis of parameters of isolated Watson-Crick base pairs versus B-DNA-like conformations indicates the same tendency of base-pair polarization and hydration. Specifically, effects of polarization of nucleobases in continuum type dielectric medium mimicking water are stronger than those caused by the presence of backbone. Polar environment as well as the presence of counterions stabilizes duplexes, facilitating helix formation. Substantial conformational changes of nucleotides upon duplex formation decrease the binding energy. In spite of structural and energetic changes, the placement of a mini-helix into the gas phase does not lead to significant disruption of the structure. On the contrary, the duplex preserves its helicity and the strands remain bound.
The Hückel Hamiltonian is an incredibly simple tight-binding model famed for its ability to capture qualitative physics phenomena arising from electron interactions in molecules and materials. Part of its simplicity arises from using only two types of empirically fit physics-motivated parameters: the first describes the orbital energies on each atom and the second describes electronic interactions and bonding between atoms. By replacing these traditionally static parameters with dynamically predicted values, we vastly increase the accuracy of the extended Hückel model. The dynamic values are generated with a deep neural network, which is trained to reproduce orbital energies and densities derived from density functional theory. The resulting model retains interpretability while the deep neural network parameterization is smooth, accurate, and reproduces insightful features of the original static parameterization. Finally, we demonstrate that the Hückel model, and not the deep neural network, is responsible for capturing intricate orbital interactions in two molecular case studies. Overall, this work shows the promise of utilizing machine learning to formulate simple, accurate, and dynamically parameterized physics models.
Computational programs accelerate the chemical discovery processes but often need proper three-dimensional molecular information as part of the input. Getting optimal molecular structures is challenging because it requires enumerating and optimizing a huge space of stereoisomers and conformers. We developed the Python-based Auto3D package for generating the low-energy 3D structures using SMILES as the input. Auto3D is based on state-of-the-art algorithms and can automatize the isomer enumeration and duplicate filtering process, 3D building process, geometry optimization, and ranking process. Tested on 50 molecules with multiple unspecified stereocenters, Auto3D is guaranteed to find the stereoconfiguration that yields the lowest-energy conformer. With Auto3D, we provide an extension of the ANI model. The new model, dubbed ANI-2xt, is trained on a tautomer-rich data set. ANI-2xt is benchmarked with DFT methods on geometry optimization and electronic and Gibbs free energy calculations. Compared with ANI-2x, ANI-2xt provides a 42% error reduction for tautomeric reaction energy calculations when using the gold-standard coupled-cluster calculation as the reference. ANI-2xt can accurately predict the energies and is several orders of magnitude faster than DFT methods.
A-DNA is thought to play a significant biological role in gene expression due to its specific conformation and binding features. In this study, double-stranded mini-helices (dA:dT)3 and (dG:dC)3 in A-like DNA conformation were investigated. M06-2X/6-31G(d,p) method has been utilized to identify the optimal geometries and predict physicochemical parameters of these systems. The results show the ability of the corresponding mini-helices to preserve their A-like conformation under the influences of solvent, charge, and Na(+) counterions. Presented structural and energetic data offer evidence that two steps of GG/CC or AA/TT are already enough to turn the DNA helix to generate different forms by favoring specific values of roll and slide at a local level. Our calculations support the experimentally known fact that AA/TT steps prefer the B-form over the A-ones, whereas GG/CC steps may be found in either the B- or A-form. The stability of mini-helices at the level of total energy analysis, ΔEtotal((A–B)), is discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.