The traditional method to improve the yield of Buchwald-Hartwig cross coupling reaction is to change the reactants or reaction conditions, but the reaction has many problems, such as harsh reaction conditions, complex synthetic route. In 2018, Doyle reported a yield prediction method based on random forest in Science. However, the predicted value of the regression tree in the random forest is the average value of the target variable of the leaf node, which treats the feature as equally important. We focused on the important characteristic information in order to obtain a more accurate yield prediction value. Therefore, it is of interest to apply some advanced deep learning methods to the performance prediction of chemical reactions, during which less training data may be required.
Machine learning is increasingly popular in predicting chemical reaction performance. This study aims to apply the CatBoost algorithm to build an intelligent prediction system for organic chemical reaction yields. The parameter analysis, convergence analysis, prediction accuracy analysis and generalization analysis are carried out. Then, the internal relationship between reaction conditions and yield is excavated through feature importance and SHAP. The results show that the proposed method has the potential as a high-precision tool to assist the optimization of chemical reaction system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.