The synthesis of complex organic molecules requires several stages, from ideation to execution, that require time and effort investment from expert chemists. Here, we report a step toward a paradigm of chemical synthesis that relieves chemists from routine tasks, combining artificial intelligence–driven synthesis planning and a robotically controlled experimental platform. Synthetic routes are proposed through generalization of millions of published chemical reactions and validated in silico to maximize their likelihood of success. Additional implementation details are determined by expert chemists and recorded in reusable recipe files, which are executed by a modular continuous-flow platform that is automatically reconfigured by a robotic arm to set up the required unit operations and carry out the reaction. This strategy for computer-augmented chemical synthesis is demonstrated for 15 drug or drug-like substances.
Both the automated generation of reaction networks and the automated prediction of synthetic trees require, in one way or another, the definition of possible transformations a molecule can undergo. One way of doing this is by using reaction templates. In view of the expanding amount of known reactions, it has become more and more difficult to envision all possible transformations that could occur in a studied system. Nonetheless, most reaction network generation tools rely on user-defined reaction templates. Not only does this limit the amount of chemistry that can be accounted for in the reaction networks, it also confines the wide-spread use of the tools by a broad public. In retrosynthetic analysis, the quality of the analysis depends on what percentage of the known chemistry is accounted for. Using databases to identify templates is therefore crucial in this respect. For this purpose, an algorithm has been developed to extract reaction templates from various types of chemical databases. Some databases such as the Kyoto Encyclopedia for Genes and Genomes and RMG do not report an atom–atom mapping (AAM) for the reactions. This makes the extraction of a template non-straightforward. If no mapping is available, it is calculated by the Reaction Decoder Tool (RDT). With a correct AAM—either calculated by RDT or specified—the algorithm consistently extracts a correct template for a wide variety of reactions, both elementary and non-elementary. The developed algorithm is a first step towards data-driven generation of synthetic trees or reaction networks, and a greater accessibility for non-expert users.Electronic supplementary materialThe online version of this article (10.1186/s13321-018-0269-8) contains supplementary material, which is available to authorized users.
Computer-aided synthesis has received much attention in recent years. It is a challenging topic in itself, due to the high dimensionality of chemical and reaction space. It becomes even more challenging when the aim is to suggest syntheses that can be performed in continuous flow. Though continuous flow offers many potential benefits, not all reactions are suited to be operated continuously. In this work, three machine learning models have been developed to provide an assessment of whether a given reaction may benefit from continuous operation, what the likelihood of success in continuous flow is for a certain set of reaction components (i.e., reactants, reagents, solvents, catalysts, and products) and, if the likelihood of success is low, which alternative reaction components can be considered. The first model uses an abstract version of a reaction template, obtained via gaussian mixture modeling, to quantify its relative increase in publishing frequency in continuous flow, without relying on potentially ambiguously defined reaction templates. The second model is an artificial neural network that categorizes feasible and infeasible reaction components with a 75% success rate. A set of reaction components is considered to be feasible if there is an explicit reference to it being used in continuous synthesis in the database; all other reaction components are considered infeasible. While several cases that are "infeasible" by this definition, are classified as feasible by the neural network, further analysis shows that for many of these cases, it is at least plausible that they are in fact feasible-they simply have not been tested to (dis)prove this. The final model suggests alternative continuous flow components with a top-1 accuracy of 95%. Combined, they offer a black-box evaluation of whether a reaction and a set of reaction components can be considered promising for continuous syntheses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.