A combinatorial quantitative structure-activity relationships (Combi-QSAR) approach has been developed and applied to a data set of 98 ambergris fragrance compounds with complex stereochemistry. The Combi-QSAR approach explores all possible combinations of different independent descriptor collections and various individual correlation methods to obtain statistically significant models with high internal (for the training set) and external (for the test set) accuracy. Seven different descriptor collections were generated with commercially available MOE, CoMFA, CoMMA, Dragon, VolSurf, and MolconnZ programs; we also included chirality topological descriptors recently developed in our laboratory Tropsha, A. J. Chem. Inf. Comput. Sci. 2001, 41, 147-158). CoMMA descriptors were used in combination with MOE descriptors. MolconnZ descriptors were used in combination with chirality descriptors. Each descriptor collection was combined individually with four correlation methods, including k-nearest neighbors (kNN) classification, Support Vector Machines (SVM), decision trees, and binary QSAR, giving rise to 28 different types of QSAR models. Multiple diverse and representative training and test sets were generated by the divisions of the original data set in two. Each model with high values of leave-one-out cross-validated correct classification rate for the training set was subjected to extensive internal and external validation to avoid overfitting and achieve reliable predictive power. Two validation techniques were employed, i.e., the randomization of the target property (in this case, odor intensity) also known as the Y-randomization test and the assessment of external prediction accuracy using test sets. We demonstrate that not every combination of the data modeling technique and the descriptor collection yields a validated and predictive QSAR model. kNN classification in combination with CoMFA descriptors was found to be the best QSAR approach overall since predictive models with correct classification rates for both training and test sets of 0.7 and higher were obtained for all divisions of the ambergris data set into the training and test sets. Many predictive QSAR models were also found using a combination of kNN classification method with other collections of descriptors. The combinatorial QSAR affords automation, computational efficiency, and higher probability of identifying significant QSAR models for experimental data sets than the traditional approaches that rely on a single QSAR method.
Computers in chemistryComputers in chemistry V 0380 Combinatorial QSAR of Ambergris Fragrance Compounds. -(KOVATCHEVA, A.; GOLBRAIKH, A.; OLOFF, S.; XIAO, Y.-D.; ZHENG, W.; WOLSCHANN, P.; BUCHBAUER, G.; TROPSHA*, A.; J. Chem. Inf.
Shape descriptors used in 3D QSAR studies naturally take into account chirality; however, for flexible and structurally diverse molecules such studies require extensive conformational searching and alignment. QSAR modeling studies of two datasets of fragrance compounds with complex stereochemistry using simple alignment-free chirality sensitive descriptors developed in our laboratories are presented. In the first investigation, 44 alpha-campholenic derivatives with sandalwood odor were represented as derivatives of several common structural templates with substituents numbered according to their relative spatial positions in the molecules. Both molecular and substituent descriptors were used as independent variables in MLR calculations, and the best model was characterized by the training set q2 of 0.79 and external test set r2 of 0.95. In the second study, several types of chirality descriptors were employed in combinatorial QSAR modeling of 98 ambergris fragrance compounds. Among 28 possible combinations of seven types of descriptors and four statistical modeling techniques, k nearest neighbor classification with CoMFA descriptors was initially found to generate the best models with the internal and external accuracies of 76 and 89%, respectively. The same dataset was then studied using novel atom pair chirality descriptors (cAP). The cAP are based on a modified definition of the atomic chirality, in which the seniority of the substituents is defined by their relative partial charge values: higher values correspond to higher seniorities. The resulting models were found to have higher predictive power than those developed with CoMFA descriptors; the best model was characterized by the internal and external accuracies of 82 and 94%, respectively. The success of modeling studies using simple alignment free chirality descriptors discussed in this paper suggests that they should be applied broadly to QSAR studies of many datasets when compound stereochemistry plays an important role in defining their activity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.