Optical properties are central to molecular design for many applications, including solar cells and biomedical imaging. A variety of ab initio and statistical methods have been developed for their prediction,...
Metrics & MoreArticle Recommendations CONSPECTUS: Designing new materials is vital for addressing pressing societal challenges in health, energy, and sustainability. The combination of physicochemical laws and empirical trial and error has long guided material design, but this approach is limited by the cost of experiments and the difficulty of deriving complex guiding principles. The space of hypothetical materials to be considered is incredibly large, and only a small fraction of possible compounds can ever be tested experimentally. The computational techniques of atomistic simulation and machine learning (ML) offer an avenue to rapidly invent new materials and navigate this enormous space. Together, they can be used to infer complex design principles and identify high-quality candidates more rapidly than trial-and-error experimentation. In this Account, we review our group's recent contributions to simulation and ML for materials design. We begin by discussing the numerical representation of materials for use in ML. Representations can be produced through deterministic algorithms, learnable encodings, or physics-based methods and lead to vector, graph, and matrix outputs. We describe how these different approaches offer distinct material-and application-specific advantages. We provide demonstrations from our own work on small-molecule drugs, macromolecules, dyes, electrolytes, and zeolites. In several cases, we show how the appropriate representation led to guiding principles that facilitated experimental materials design. Next, we highlight the development of ML methods for enhancing atomistic simulation. These advances help to improve simulation accuracy and expand the time and length scales that can be explored. They include differentiable atomistic simulations in which ensemble-averaged quantities are differentiated with respect to system parameters, and novel autoregressive methods for enhanced sampling of challenging physical distributions. Other developments include learnable coarse-grained models, which can accelerate molecular dynamics while minimizing the loss of all-atom information, and ML interatomic potentials, which can be trained on maximally informative quantum chemistry data through active learning and adversarial uncertainty attacks. Next, we show how these combined computational advances have enabled high-throughput virtual screening. This has led to the discovery of low-cost organic structure-directing agents for zeolite synthesis, polymer electrolytes, and efficient photoswitches for targeted medicine. We conclude by discussing the limitations of ML and simulation. These include the large data requirements and limited chemical transferability of the former and the speed−accuracy trade-offs of the latter. We predict that advancements in quantum chemistry will further accelerate simulations, while the incorporation of physical principles will improve the reliability of ML.
A closed-loop, autonomous molecular discovery platform driven by integrated machine learning tools was developed to accelerate the design of molecules with desired properties. Two case studies are demonstrated on dye-like molecules, targeting absorption wavelength, lipophilicity, and photo-oxidative stability. In the first, the platform experimentally realized 312 unreported molecules across three automatic iterations of molecular design-make-test-analyze cycles while exploring the structure–function space of four rarely reported scaffolds. In each iteration, the property-prediction models which guided the exploration learned the structure–property space of diverse inexpensive scaffold derivatives realized through using multi-step syntheses. Conversely, the second study exploited property models trained on a chemical space with pre-existing examples to discover 6 top-performing molecules within the structure-property space. By closing the molecular discovery cycle of prediction, synthesis, measurement, and model retraining, the platform demonstrates the potential for integrated platforms to automatically understand a local chemical space and discover functional molecules.
Deep learning has become a powerful and frequently employed tool for the prediction of molecular properties, thus creating a need for open-source and versatile software solutions that can be operated by non-experts. Among current approaches, directed message-passing neural networks (D-MPNNs) have proven to perform well on a variety of property prediction tasks. The software package Chemprop implements the D-MPNN architecture, and offers simple, easy, and fast access to machine-learned molecular properties. Compared to its initial version, we present a multitude of new Chemprop functionalities such as the support of multi-molecule properties, reactions, atom/bond-level properties, and spectra. Further, we incorporate various uncertainty quantification and calibration methods along with related metrics, as well as pretraining and transfer learning workflows, improved hyperparameter optimization, and other customization options concerning loss functions or atom/bond features. We benchmark D-MPNN models trained using Chemprop with the new reaction, atom-level and spectra functionality on a variety of property prediction datasets, including MoleculeNet and SAMPL, and observe state-of-the-art performance on the prediction of water-octanol partition coefficients, reaction barrier heights, atomic partial charges, and absorption spectra. Chemprop enables out-of-the-box training of D-MPNN models for a variety of problem settings in a fast, user-friendly, and open-source software.
Optical properties are central to molecular design for many applications, including solar cells and biomedical imaging. A variety of ab initio and statistical methods have been developed for their prediction, each with a trade-off between accuracy, generality, and cost. Existing theoretical methods such as time-dependent density functional theory (TD-DFT) are generalizable across chemical space because of their robust physics-based foundations but still exhibit random and systematic errors with respect to experiment despite their high computational cost. Statistical methods can achieve high accuracy at a lower cost, but data sparsity and unoptimized molecule and solvent representations often limit their ability to generalize. Here, we utilize directed message passing neural networks (D-MPNNs) to represent both dye molecules and solvents for predictions of molecular absorption peaks in solution. Additionally, we demonstrate a multi-fidelity approach based on an auxiliary model trained on over 28,000 TD-DFT calculations that further improves accuracy and generalizability, as shown through rigorous splitting strategies. Combining several openly-available experimental datasets, we benchmark these methods against a state-of-the-art regression tree algorithm and compare the D-MPNN solvent representation to several alternatives. Finally, we explore the interpretability of the learned representations using dimensionality reduction and evaluate the use of ensemble variance as an estimator of the epistemic uncertainty in our predictions of molecular peak absorption in solution. The prediction methods proposed herein can be integrated with active learning, generative modeling, and experimental workflows to enable the more rapid design of molecules with targeted optical properties.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.