This review is dedicated to a survey on molecular similarity and diversity. Key findings reported in recent investigations are selectively highlighted and summarized. Even if this overview is mainly centered in chemoinformatics, applications in other areas (pharmaceutical and medical chemistry, combinatorial chemistry, chemical databases management, etc.) are also introduced. The approaches used to define and describe the concepts of molecular similarity and diversity in the context of chemoinformatics are discussed in the first part of this review. We introduce, in the second and third parts, the descriptions and analyses of different methods and techniques. Finally, current applications and problems are enumerated and discussed in the last part.
Predictive modeling has become a practical research tool in homogeneous catalysis. It can help to pinpoint 'good regions' in the catalyst space, narrowing the search for the optimal catalyst for a given reaction. Just like any other new idea, in silico catalyst optimization is accepted by some researchers and met with skepticism by others. The basic requirements for good predictive models are a reliable set of initial experimental data, a method for generating and testing virtual catalyst libraries, and robust validation protocols. Once you have these, the key task is translating thecatalysis problems into something that a computer can understand. In this tutorial review we explain in simple terms what predictive modeling actually is, why and when should one use it, and how it can be implemented.
The Hansen solubility parameter approach is revisited by implementing the thermodynamics of dissolution and mixing. Hansen's pragmatic approach has earned its spurs in predicting solvents for polymer solutions, but for molecular solutes improvements are needed. By going into the details of entropy and enthalpy, several corrections are suggested that make the methodology thermodynamically sound without losing its ease of use. The most important corrections include accounting for the solvent molecules’ size, the destruction of the solid's crystal structure, and the specificity of hydrogen‐bonding interactions, as well as opportunities to predict the solubility at extrapolated temperatures. Testing the original and the improved methods on a large industrial dataset including solvent blends, fit qualities improved from 0.89 to 0.97 and the percentage of correct predictions rose from 54 % to 78 %. Full Matlab scripts are included in the Supporting Information, allowing readers to implement these improvements on their own datasets.
This paper presents a new protocol based on 3D molecular descriptors using QM calculations for use in CoMFA-like 3D-QSSR. The new method was developed and then applied to predict catalytic selectivity in the asymmetric alkylation of aldehydes catalyzed by Zn-aminoalcohols. The molecular descriptors are obtained straightforwardly from the electronic charge density function, rho(r), and the molecular electrostatic potential (MEP) distributions. The chemically meaningful Molecular Shape Field (MSF) descriptor that accounts for the shape properties of the catalyst is defined from rho(r). Alignment independence was achieved by computing the product of the MSF and MEP values of pairs of points over a given distance range on a molecular isosurface and then selecting the product with the highest value. The new QSSR method demonstrated good predictive ability (q2 = 0.79) when full cross-validation procedures were carried out. Accurate predictions were made for a larger data set, although some deviations occurred in the predictions for catalytic systems with low enantiodiscrimination. Analysis of this QSSR model allows for the following: (1) evaluation of the contribution of each functional group to enantioselectivity and (2) the molecular descriptors to be related to previously proposed stereochemical models for the reaction under study.
Theoretical chemistry Z 0350 Molecular Similarity and Diversity in Chemoinformatics: From Theory to Applications -[343 refs.]. -(MALDONADO, A. G.; DOUCET, J. P.; PETITJEAN, M.; FAN*, B.-T.; Mol. Diversity 10 (2006) 1, 39-79; Inst. Topol. Dyn. Syst., CNRS, Univ. Paris 7 -Denis Diderot, F-75005 Paris, Fr.; Eng.) -Lindner 16-274
We present a computer-based heuristic framework for designing libraries of homogeneous catalysts. In this approach, a set of given bidentate ligand-metal complexes is disassembled into key substructures ("building blocks"). These include metal atoms, ligating groups, backbone groups, and residue groups. The computer then rearranges these building blocks into a new library of virtual catalysts. We then tackle the practical problem of choosing a diverse subset of catalysts from this library for actual synthesis and testing. This is not trivial, since catalyst diversity itself is a vague concept. Thus, we first define and quantify this diversity as the difference between key structural parameters (descriptors) of the catalysts, for the specific reaction at hand. Subsequently, we propose a method for choosing diverse sets of catalysts based on catalyst backbone selection, using weighted D-optimal design. The computer selects catalysts with different backbones, where the difference is measured as a distance in the descriptors space. We show that choosing such a D-optimal subset of backbones gives more diversity than a simple random sampling. The results are demonstrated experimentally in the nickel-catalysed hydrocyanation of 3-pentenenitrile to adiponitrile. Finally, the connection between backbone diversity and catalyst diversity, and the implications towards in silico catalysis design are discussed.
Each oil reservoir could be characterized by a set of parameters such as temperature, pressure, oil composition, and brine salinity, etc. In the context of the chemical enhanced oil recovery (EOR), the selection of high performance surfactants is a challenging and time-consuming task since this strongly depends on the reservoir’s conditions. The situation becomes even more complicated if the surfactant formulation is a blend of two or more surfactants. In the present work, we report quantitative structure–property relationships (QSPR) correlating surfactants’ structures and their composition in a mixture with optimal salinity (S opt), corresponding to minimal interfacial tension in the reference brine/surfactants/n-dodecane system, at T = 313 K and P = 0.1 MPa. Particular attention was paid to selected families of surfactants: α-olefin sulfonate (AOS), internal olefin sulfonate (IOS), alkyl ether sulfate (AES), and alkyl glyceryl ether sulfonate (AGES). The models were built and validated on the database containing S opt values for 75 surfactants’ formulations. Molecular structures of amphiphilic molecules were encoded by functional group count descriptors (FGCD), ISIDA substructural molecular fragment (SMF) descriptors, and CODESSA molecular descriptors (CMD). For mixtures, descriptors were calculated as linear combinations of descriptors of individual compounds weighted by their mass fractions in mixtures. Different machine-learning methodssupport vector machine (SVM), partial least-squares (PLS) regression, and random subspace (RS)have been used for the modeling. Both global (on the entire database) and local (on individual families) models have been built. Models display reasonable accuracy (about 0.2 log S opt units) which is comparable with the experimental error of measured S opt. Our results show that the suggested approach can be successfully used to build predictive models for relatively small data sets of mixtures of chemical compounds.
We combine multicomponent reactions, catalytic performance studies and predictive modelling to find transfer hydrogenation catalysts. An initial set of 18 ruthenium-carbene complexes were synthesized and screened in the transfer hydrogenation of furfural to furfurol with isopropyl alcohol complexes gave varied yields, from 62% up to >99.9%, with no obvious structure/activity correlations. Control experiments proved that the carbene ligand remains coordinated to the ruthenium centre throughout the reaction. Deuterium-labelling studies showed a secondary isotope effect (kH:kD=1.5). Further mechanistic studies showed that this transfer hydrogenation follows the so-called monohydride pathway. Using these data, we built a predictive model for 13 of the catalysts, based on 2D and 3D molecular descriptors. We tested and validated the model using the remaining five catalysts (cross-validation, R2=0.913). Then, with this model, the conversion and selectivity were predicted for four completely new ruthenium-carbene complexes. These four catalysts were then synthesized and tested. The results were within 3% of the model’s predictions, demonstrating the validity and value of predictive modelling in catalyst optimization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.