Quantitative
predictions of reaction properties, such as activation
energy, have been limited due to a lack of available training data.
Such predictions would be useful for computer-assisted reaction mechanism
generation and organic synthesis planning. We develop a template-free
deep learning model to predict the activation energy given reactant
and product graphs and train the model on a new, diverse data set
of gas-phase quantum chemistry reactions. We demonstrate that our
model achieves accurate predictions and agrees with an intuitive understanding
of chemical reactivity. With the continued generation of quantitative
chemical reaction data and the development of methods that leverage
such data, we expect many more methods for reactivity prediction to
become available in the near future.
Reaction times, activation energies, branching ratios, yields, and many other quantitative attributes are important for precise organic syntheses and generating detailed reaction mechanisms. Often, it would be useful to be able to classify proposed reactions as fast or slow. However, quantitative chemical reaction data, especially for atom-mapped reactions, are difficult to find in existing databases. Therefore, we used automated potential energy surface exploration to generate 12,000 organic reactions involving H, C, N, and O atoms calculated at the ωB97X-D3/def2-TZVP quantum chemistry level. We report the results of geometry optimizations and frequency calculations for reactants, products, and transition states of all reactions. Additionally, we extracted atom-mapped reaction SMILES, activation energies, and enthalpies of reaction. We believe that this data will accelerate progress in automated methods for organic synthesis and reaction mechanism generation—for example, by enabling the development of novel machine learning models for quantitative reaction prediction.
Quantitative estimates of reaction
barriers are essential for developing
kinetic mechanisms and predicting reaction outcomes. However, the
lack of experimental data and the steep scaling of accurate quantum
calculations often hinder the ability to obtain reliable kinetic values.
Here, we train a directed message passing neural network on nearly
24,000 diverse gas-phase reactions calculated at CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP.
Our model uses 75% fewer parameters than previous studies, an improved
reaction representation, and proper data splits to accurately estimate
performance on unseen reactions. Using information from only the reactant
and product, our model quickly predicts barrier heights with a testing
MAE of 2.6 kcal mol–1 relative to the coupled-cluster
data, making it more accurate than a good density functional theory
calculation. Furthermore, our results show that future modeling efforts
to estimate reaction properties would significantly benefit from fine-tuning
calibration using a transfer learning technique. We anticipate this
model will accelerate and improve kinetic predictions for small molecule
chemistry.
Predicting how a drug-like molecule binds to a specific protein target is a core problem in drug discovery. An extremely fast computational binding method would enable key applications such as fast virtual screening or drug engineering. Existing methods are computationally expensive as they rely on heavy candidate sampling coupled with scoring, ranking, and fine-tuning steps. We challenge this paradigm with EQUIBIND, an SE(3)-equivariant geometric deep learning model performing direct-shot prediction of both i) the receptor binding location (blind docking) and ii) the ligand's bound pose and orientation. EquiBind achieves significant speed-ups and better quality compared to traditional and recent baselines. Further, we show extra improvements when coupling it with existing fine-tuning techniques at the cost of increased running time. Finally, we propose a novel and fast fine-tuning model that adjusts torsion angles of a ligand's rotatable bonds based on closed-form global minima of the von Mises angular distance to a given input atomic point cloud, avoiding previous expensive differential evolution strategies for energy minimization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.