Here, the utility of Generative Topographic Maps (GTM) for data visualization, structure-activity modeling and database comparison is evaluated, on hand of subsets of the Database of Useful Decoys (DUD). Unlike other popular dimensionality reduction approaches like Principal Component Analysis, Sammon Mapping or Self-Organizing Maps, the great advantage of GTMs is providing data probability distribution functions (PDF), both in the high-dimensional space defined by molecular descriptors and in 2D latent space. PDFs for the molecules of different activity classes were successfully used to build classification models in the framework of the Bayesian approach. Because PDFs are represented by a mixture of Gaussian functions, the Bhattacharyya kernel has been proposed as a measure of the overlap of datasets, which leads to an elegant method of global comparison of chemical libraries.
Several popular machine learning methods--Associative Neural Networks (ANN), Support Vector Machines (SVM), k Nearest Neighbors (kNN), modified version of the partial least-squares analysis (PLSM), backpropagation neural network (BPNN), and Multiple Linear Regression Analysis (MLR)--implemented in ISIDA, NASAWIN, and VCCLAB software have been used to perform QSPR modeling of melting point of structurally diverse data set of 717 bromides of nitrogen-containing organic cations (FULL) including 126 pyridinium bromides (PYR), 384 imidazolium and benzoimidazolium bromides (IMZ), and 207 quaternary ammonium bromides (QUAT). Several types of descriptors were tested: E-state indices, counts of atoms determined for E-state atom types, molecular descriptors generated by the DRAGON program, and different types of substructural molecular fragments. Predictive ability of the models was analyzed using a 5-fold external cross-validation procedure in which every compound in the parent set was included in one of five test sets. Among the 16 types of developed structure--melting point models, nonlinear SVM, ASNN, and BPNN techniques demonstrate slightly better performance over other methods. For the full set, the accuracy of predictions does not significantly change as a function of the type of descriptors. For other sets, the performance of descriptors varies as a function of method and data set used. The root-mean squared error (RMSE) of prediction calculated on independent test sets is in the range of 37.5-46.4 degrees C (FULL), 26.2-34.8 degrees C (PYR), 38.8-45.9 degrees C (IMZ), and 34.2-49.3 degrees C (QUAT). The moderate accuracy of predictions can be related to the quality of the experimental data used for obtaining the models as well as to difficulties to take into account the structural features of ionic liquids in the solid state (polymorphic effects, eutectics, glass formation).
Computers in chemistry V 0380 Exhaustive QSPR Studies of a Large Diverse Set of Ionic Liquids: How Accurately Can We Predict Melting Points? -(VARNEK*, A.; KIREEVA, N.; TETKO, I. V.; BASKIN, I. I.; SOLOV'EV, V. P.; J. Chem. Inf. Model. (J. Chem. Inf. Comput. Sci.) 47 (2007) 3, 1111-1122; Lab. Infochim., Univ. Louis Pasteur, F-67000 Strasbourg, Fr.; Eng.) -Lindner 34-204
In this paper, we associate an applicability domain (AD) of QSAR/QSPR models with the area in the input (descriptor) space in which the density of training data points exceeds a certain threshold. It could be proved that the predictive performance of the models (built on the training set) is larger for the test compounds inside the high density area, than for those outside this area. Instead of searching a decision surface separating high and low density areas in the input space, the one-class classification 1-SVM approach looks for a hyperplane in the associated feature space. Unlike other reported in the literature AD definitions, this approach: (i) is purely "data-based", i.e. it assigns the same AD to all models built on the same training set, (ii) provides results that depend only on the initial descriptors pool generated for the training set, (iii) can be used for the huge number of descriptors, as well as in the framework of structured kernel-based approaches, e.g., chemical graph kernels. The developed approach has been applied to improve the performance of QSPR models for stability constants of the complexes of organic ligands with alkaline-earth metals in water.
Chemoinformatics / In silico design / Complexation / ExtractionSummary. Chemoinformatics approaches open new opportunities for computer-aided design of new efficient metal binders. Here, we demonstrate performances of ISIDA and COMET software tools to predict stability constants (log K ) of the metal ion/organic ligand complexes in solution and to design in silico new molecules possessing desirable properties. The predictive models for log K of lanthanides complexation in water have been developed. Some new uranyl binders based on monoamides and on phosphoryl-containing podands were suggested theoretically, then synthesized and tested experimentally. Reasonable agreement between experimental uranyl distribution coefficients and theoretically predicted values has been observed.
The organic electrolytes of most current commercial rechargeable Li-ion batteries (LiBs) are flammable, toxic, and have limited electrochemical energy windows. All-solid-state battery technology promises improved safety, cycling performance, electrochemical stability, and possibility of device miniaturization and enables a number of breakthrough technologies towards the development of new high power and energy density microbatteries for electronics with low processing cost, solid oxide fuel cells, electrochromic devices, etc. Currently, rational materials design is attracting significant attention, which has resulted in a strong demand for methodologies that can accelerate the design of materials with tailored properties; cheminformatics can be considered as an efficient tool in this respect. This study was focused on several aspects: (i) identification of the parameters responsible for high Li-ion conductivity in garnet structured oxides; (ii) development of quantitative models to elucidate composition-structure-Li ionic conductivity relationships, taking into account the experimental details of sample preparation; (iii) circumscription of the materials space of solid garnet-type electrolytes, which is attractive for virtual screening. Several candidate compounds have been recommended for synthesis as potential solid state electrolyte materials.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.