Although machine learning has a long-standing history in chemical research with respect to the prediction of molecular properties and biological activities, the quantitative modeling of reactivity has only been approached recently, and current models suggest that complex and specific parameterization is inevitable. As opposed to this, we report a simple machine learning model for predicting various reaction outcomes, such as yields and stereoselectivities. Being based on a solely structural input, our model should be transferable to diverse problems related to organic molecules.
The signature quadratic form distance has been introduced as an adaptive similarity measure coping with flexible content representations of multimedia data. While this distance has shown high retrieval quality, its high computational complexity underscores the need for efficient search methods. Recent research has shown that a huge improvement in search efficiency is achieved when using metric indexing. In this paper, we analyze the applicability of Ptolemaic indexing to the signature quadratic form distance. We show that it is a Ptolemaic metric and present an application of Ptolemaic pivot tables to image databases, resolving queries nearly four times as fast as the state-of-the-art metric solution, and up to 300 times as fast as sequential scan.
Despite their enormous potential, machine learning methods
have only found limited application in predicting reaction outcomes, as current
models are often highly complex and, most importantly, are not transferrable to
different problem sets. Herein, we present the direct utilization of Lewis
structures in a machine learning platform for diverse applications in organic
chemistry. Therefore, an input based on multiple fingerprint features (MFF) as
a universal molecular representation was developed and used for problem sets of
increasing complexity: First, molecular properties across a diverse array of
molecules could be predicted accurately. Next, reaction outcomes such as
stereoselectivities and yields were predicted for experimental data sets that
were previously evaluated using (complex) problem-oriented descriptor models. As
a final application, a systematic high-throughput data set showed good
correlation when using the MFF model, which suggests that this approach is
general and ready for immediate adoption by chemists.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.