Support vector machines (SVMs) were used to develop QSAR models that correlate molecular structures to their toxicity and bioactivities. The performance and predictive ability of SVM are investigated and compared with other methods such as multiple linear regression and radial basis function neural network methods. In the present study, two different data sets were evaluated. The first one involves an application of SVM to the development of a QSAR model for the prediction of toxicities of 153 phenols, and the second investigation deals with the QSAR model between the structures and the activities of a set of 85 cyclooxygenase 2 (COX-2) inhibitors. For each application, the molecular structures were described using either the physicochemical parameters or molecular descriptors. In both studied cases, the predictive ability of the SVM model is comparable or superior to those obtained by MLR and RBFNN. The results indicate that SVM can be used as an alternative powerful modeling tool for QSAR studies.
The least squares support vector machine (LSSVM), as a novel machine learning algorithm, was used to develop quantitative and classification models as a potential screening mechanism for a novel series of 1,4-dihydropyridine calcium channel antagonists for the first time. Each compound was represented by calculated structural descriptors that encode constitutional, topological, geometrical, electrostatic, quantum-chemical features. The heuristic method was then used to search the descriptor space and select the descriptors responsible for activity. Quantitative modeling results in a nonlinear, seven-descriptor model based on LSSVM with mean-square errors 0.2593, a predicted correlation coefficient (R(2)) 0.8696, and a cross-validated correlation coefficient (R(cv)(2)) 0.8167. The best classification results are found using LSSVM: the percentage (%) of correct prediction based on leave one out cross-validation was 91.1%. This paper provides a new and effective method for drug design and screening.
The 13 C NMR chemical shift of sp 3 carbon atoms situated in the R position relative to the double bond in acyclic alkenes was estimated with multilayer feedforward artificial neural networks (ANNs) and multilinear regression (MLR), using as structural descriptors a topo-stereochemical code which characterizes the environment of the resonating carbon atom. The predictive ability of the two models was tested by the leave-20%-out cross-validation method. The neural model provides better results than the MLR model both in calibration and in cross-validation, demonstrating that there exists a nonlinear relationship between the structural descriptors and the investigated 13 C NMR chemical shift and that the neural model is capable to capture such a relationship in a simple and effective way. A comparison between a general model for the estimation of the 13 C NMR chemical shift and the ANN model indicates that general models are outperformed by more specific models, and in order to improve the predictions a possible way is to develop environment-specific models. The approach proposed in this paper can be used in automated spectra interpretation or computer-assisted structure elucidation to constrain the number of possible candidates generated from the experimental spectra.
A new simple algorithm for ring perception is reported. It directly finds the smallest set of smallest rings (SSSR) from a minimum set of data, a connection table without any classification of rings and accessory extraction procedure. Its application for some complex ring systems is presented.
The 13 C NMR chemical shift of sp 2 carbon atoms in acyclic alkenes was estimated with multilayer feedforward artificial neural networks (ANNs) and multilinear regression (MLR), using as structural descriptors a vector made of 12 components encoding the environment of the resonating carbon atom. The neural network quantitative model provides better results than the MLR model calibrated with the same data. The predictive ability of both the ANN and MLR models was tested by the leave-20%-out (L20%O) cross-validation method, demonstrating the superior performance of the neural model. The number of neurons in the hidden layer was varied between 2 and 7, and three activation functions were tested in the neural model: the hyperbolic tangent or a bell-shaped function for the hidden layer and a linear or a hyperbolic tangent function for the output layer. All four combinations of activation functions give close results in the calibration of the ANN model, while for the prediction a linear output function performs better than a hyperbolic tangent one, but from a statistical point of view one could not choose a particular combination against the others. For the ANNs with four neurons in the hidden layer, the standard deviation for calibration ranges between 0.59 and 0.63 ppm, while for prediction it lies between 0.89 and 1.07 ppm. We propose a parallel use of the four ANNs for the prediction of unknown shifts, because the mean of the four predictions exhibit a smaller number of outliers with lower residuals. The present model is compared with three additive schemes for the calculation of the sp 2 13 C NMR chemical shifts, and the statistical analysis of the results demonstrates that the ANN model gives better predictions than the classical ones.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.