This study highlights new opportunities for optimal reaction route selection from large chemical databases brought about by the rapid digitalisation of chemical data.
Automated prediction of reaction impurities is useful in early‐stage reaction development, synthesis planning and optimization. Existing reaction predictors are catered towards main product prediction, and are often black‐box, making it difficult to troubleshoot erroneous outcomes. This work aims to present an automated, interpretable impurity prediction workflow based on data mining large chemical reaction databases. A 14‐step workflow was implemented in Python and RDKit using Reaxys® data. Evaluation of potential chemical reactions between functional groups present in the same reaction environment in the user‐supplied query species can be accurately performed by directly mining the Reaxys® database for similar or ‘analogue’ reactions involving these functional groups. Reaction templates can then be extracted from analogue reactions and applied to the relevant species in the original query to return impurities and transformations of interest. Three proof‐of‐concept case studies (paracetamol, agomelatine and lersivirine) were conducted, with the workflow correctly suggesting impurities within the top two outcomes. At all stages, suggested impurities can be traced back to the originating template and analogue reaction in the literature, allowing for closer inspection and user validation. Ultimately, this work could be useful as a benchmark for more sophisticated algorithms or models since it is interpretable, as opposed to purely black‐box solutions.
The transition toward a circular and biobased chemical industry is needed to cut global CO 2 emissions and limit the chemical industry's overall impact on the environment. However, the development of circular chemical reaction systems is challenging as it requires symbiotic sets of novel chemical reaction pathways and involves unconventional processing steps. We present a methodological pipeline for automated reaction network optimization. The tools can guide the development of circular processes on the reaction pathway level. Chemical big data combined with energetic assessment metrics and state-of-the-art decision-making has the potential to efficiently identify the most promising reaction systems. We mine large-scale chemical reaction data from Reaxys database and automate the screening of pathways based on chemical rules. We then approximate thermodynamic properties for exergy calculations of the prescreened pathways and formulate the optimization problem as linear programming and mixed-integer linear programming problem. The methodological workflow is illustrated in a case study on the conversion of βpinene to citral. Our results show that the tools are well suited to model circular process interactions within different environment scenarios.
Automated prediction of reaction impurities can be useful in facilitating rapid early-stage reaction development, synthesis planning and optimization. Existing reaction predictors are catered towards main product prediction, and are often black-box, making it difficult to troubleshoot erroneous outcomes. This work presents an automated, interpretable impurity prediction workflow based on data mining large chemical reaction databases. A 14-step workflow was implemented in Python and RDKit using Reaxys® data. Evaluation of potential chemical reactions between functional groups present in the same reaction environment in the user-supplied query species can be accurately performed by directly mining the Reaxys® database for similar or ‘analogue’ reactions involving these functional groups. Reaction templates can then be extracted from analogue reactions and applied to the relevant species in the original query to return impurities and transformations of interest. Three proof-of-concept case studies based on active pharmaceutical ingredients (paracetamol, agomelatine and lersivirine) were conducted, with the workflow able to suggest the correct impurities within the top two outcomes. At all stages, suggested impurities can be traced back to the originating template and analogue reaction in the literature, allowing for closer inspection and user validation. Ultimately, this work could be useful as a benchmark for more sophisticated algorithms or models since it is interpretable, as opposed to purely black-box solutions, and illustrates the potential of chemical data in impurity prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.