Computer-aided synthesis has received much attention in recent years. It is a challenging topic in itself, due to the high dimensionality of chemical and reaction space. It becomes even more challenging when the aim is to suggest syntheses that can be performed in continuous flow. Though continuous flow offers many potential benefits, not all reactions are suited to be operated continuously. In this work, three machine learning models have been developed to provide an assessment of whether a given reaction may benefit from continuous operation, what the likelihood of success in continuous flow is for a certain set of reaction components (i.e., reactants, reagents, solvents, catalysts, and products) and, if the likelihood of success is low, which alternative reaction components can be considered. The first model uses an abstract version of a reaction template, obtained via gaussian mixture modeling, to quantify its relative increase in publishing frequency in continuous flow, without relying on potentially ambiguously defined reaction templates. The second model is an artificial neural network that categorizes feasible and infeasible reaction components with a 75% success rate. A set of reaction components is considered to be feasible if there is an explicit reference to it being used in continuous synthesis in the database; all other reaction components are considered infeasible. While several cases that are "infeasible" by this definition, are classified as feasible by the neural network, further analysis shows that for many of these cases, it is at least plausible that they are in fact feasible-they simply have not been tested to (dis)prove this. The final model suggests alternative continuous flow components with a top-1 accuracy of 95%. Combined, they offer a black-box evaluation of whether a reaction and a set of reaction components can be considered promising for continuous syntheses.
Machine learning has proven effective for predicting properties of pure compounds from molecular structures, but properties of mixtures, in particular oil fractions, are rarely dealt with. At best, the bulk properties are estimated based on pure compound properties, linear mixing rules, and a reconstructed composition of the feedstock. As the detailed composition of such mixtures is rarely well determined and often approximated by lumps, the accuracy of the estimated bulk properties can be improved. In this work, we demonstrate for a naphtha case study our bulk property estimation method. First, a detailed PIONA composition is delumped into a molecule-level composition, and a machine learning-based approach is used to predict properties of those molecules, which are further combined in another deep neural network for the prediction of bulk properties. The latter machine learning models are trained on mixture properties using vectors that represent the mixture. The first vector is a linear combination of the molecular representation vectors and the representation of the molecular geometries that make up the mixture. The second vector applies linear mixing rules on boiling temperatures, critical temperatures, liquid densities, and vapor pressures that are predicted with machine learning. The last vector consists of a learned distillation curve. We show that an integrated machine learning approach that starts from the molecular structures in the mixture offers significant improvements in predicting mixture properties over existing approaches applied in industry and academia.
Accurate thermochemistry estimation of polycyclic molecules is crucial for kinetic modeling of chemical processes that use renewable and alternative feedstocks. In kinetic model generators, molecular properties are estimated rapidly with group additivity, but this method is known to have limitations for polycyclic structures. This issue has been resolved in our work by combining a geometry-based molecular representation with a deep neural network trained on ab initio data. Each molecule is transformed into a probabilistic vector from its interatomic distances, bond angles, and dihedral angles. The model is tested on a small experimental dataset (200 molecules) from the literature, a new medium-sized set (4000 molecules) with both open-shell and closed-shell species, calculated at the CBS-QB3 level with empirical corrections, and a large G4MP2-level QM9-based dataset (40 000 molecules). Heat capacities between 298.15 and 2500 K are calculated in the medium set with an average deviation of about 1.5 J mol–1 K–1 and the standard entropy at 298.15 K is predicted with an average error below 4 J mol–1 K–1. The standard enthalpy of formation at 298.15 K has an average out-of-sample error below 4 kJ mol–1 on a QM9 training set size of around 15 000 molecules. By fitting NASA polynomials, the enthalpy of formation at higher temperatures can be calculated with the same accuracy as the standard enthalpy of formation. Uncertainty quantification by means of the ensemble standard deviation is included to indicate when molecules that are on the edge or outside of the application range of the model are evaluated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.