Little attention has been given to the selection of trial descriptor sets when designing a QSAR analysis even though a great number of descriptor classes, and often a greater number of descriptors within a given class, are now available. This paper reports an effort to explore interrelationships between QSAR models and descriptor sets. Zhou and co-workers (Zhou et al., Nano Lett. 2008, 8 (3), 859-865) designed, synthesized, and tested a combinatorial library of 80 surface modified, that is decorated, multi-walled carbon nanotubes for their composite nanotoxicity using six endpoints all based on a common 0 to 100 activity scale. Each of the six endpoints for the 29 most nanotoxic decorated nanotubes were incorporated as the training set for this study. The study reported here includes trial descriptor sets for all possible combinations of MOE, VolSurf, and 4D-fingerprints (FP) descriptor classes, as well as including and excluding explicit spatial contributions from the nanotube. Optimized QSAR models were constructed from these multiple trial descriptor sets. It was found that (a) both the form and quality of the best QSAR models for each of the endpoints are distinct and (b) some endpoints are quite dependent upon 4D-FP descriptors of the entire nanotube-decorator complex. However, other endpoints yielded equally good models only using decorator descriptors with and without the decorator-only 4D-FP descriptors. Lastly, and most importantly, the quality, significance, and interpretation of a QSAR model were found to be critically dependent on the trial descriptor sets used within a given QSAR endpoint study.
Hepatotoxicity, drug-induced liver injury, and competitive Cytochrome P-450 (CYP) isozyme binding are serious problems associated with drug use. It would be favorable to avoid or to understand potential CYP inhibition at the developmental stages. However, current in silico CYP prediction models or available public prediction servers can provide only yes/no classification results for just one or a few CYP enzymes. In this study, we utilized a rule-based C5.0 algorithm with different descriptors, including PaDEL, Mold(2), and PubChem fingerprints, to construct rule-based inhibition prediction models for five major CYP enzymes-CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4-that account for 90% of drug oxidation or hydrolysis. We also developed a rational sampling algorithm for the selection of compounds in the training data set, to enhance the performance of these CYP prediction models. The optimized models include several improved features. First, the final models significantly outperformed all of the currently available models. Second, the final models can also be used for rapid virtual screening of a large set of compounds due to their ruleset-based nature. Moreover, such rule-based prediction models can provide rulesets for structural features related to the five major CYP enzymes. The five most significant rules for CYP inhibition were identified for each CYP enzymes and discussed. An example was chosen for each of the five CYP enzymes to demonstrate how rule-based models can be used to gain insights into structural features that correspond with CYP inhibitions. A newer version of the freely accessible CYP prediction server, CypRules, is presented here as a result of the aforementioned improvements.
CypRules is freely accessible at http://cyprules.cmdm.tw/ and models, descriptor and program files for all compounds are publically available at http://cyprules.cmdm.tw/sources/sources.rar.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.