Assessing Model Fit by Cross-Validation

Hawkins, Douglas M.; Basak, Subhash C.; Mills, Denise

doi:10.1021/ci025626i

Cited by 661 publications

(477 citation statements)

References 30 publications

(32 reference statements)

Supporting

Mentioning

450

Contrasting

Unclassified

Order By: Relevance

“…The use of a large portion of the data for checking the model fit seems a waste of valuable and often costly information. In those cases where external validation is not possible, alternatives for predictive validation are of interest, for example, methods such as crossvalidation 18 and Y-randomization. Crossvalidation is a common technique where a number of modified data sets are created by deleting, in each case, one or a small group of compounds from the data.…”

Section: Measure Of Predictivitymentioning

confidence: 99%

A new computer program for QSAR‐analysis: ARTE‐QSAR

Damme

Bultinck

2007

J Comput Chem

View full text Add to dashboard Cite

A new computer program has been designed to build and analyze quantitative-structure activity relationship (QSAR) models through regression analysis. The user is provided with a range of regression and validation techniques. The emphasis of the program lies mainly in the validation of QSAR models in chemical applications. ARTE-QSAR produces an easy interpretable output from which the user can conclude if the obtained model is suitable for prediction and analysis.

show abstract

Section: Measure Of Predictivitymentioning

confidence: 99%

A new computer program for QSAR‐analysis: ARTE‐QSAR

Damme

Bultinck

2007

J Comput Chem

View full text Add to dashboard Cite

show abstract

“…[27] One also needs to check the ability of QSAR models in providing competent predictions on 'similar' datasets via validation on out-of-sample test sets. [28][29][30][31][32] For a relatively small sample, i.e., a small collection of compounds, this is done by following a leave-one-out (LOO) cross-validation method. For data sets with a large number of compounds, a more computationally economical way is to do a k-fold cross-validation: split the data set randomly into k (previously decided) equal subsets, take each subset in turn as test set and use the remaining compounds as training sets and use the model to obtain predictions.…”

Section: Statistical Methods For Qsar Model Development and Validationmentioning

confidence: 99%

“…Essentially this method ends up using information from the holdout compound/ split subset to predict activity of those very samples. This naïve cross-validation procedure causes synthetic inflation of the cross-validated q 2 , hence compromises the predictive ability of the model [29][30][31][32] (Figure 3). A two-step approach (referred in Figure 3 as 'Two-deep CV') helps avoid this tricky situation.…”

Section: Statistical Methods For Qsar Model Development and Validationmentioning

confidence: 99%

“…In the calculation of q 2 in the rank deficient case, one must follow the two-deep cross-validation procedure; otherwise the calculated q 2 will reflect overfitting. [28][29][30][31] In HiQSAR modeling, we found that of the four types of calculated molecular descriptors, viz., TS, TC, 3-D, and QC indices, in most cases a TS + TC combination gave good quality models; the addition of 3-D or QC descriptors after the utilization of TS and TC combination did not improve the model quality significantly. This is a good news in view of the fact that we already reached the age of big data [57] and easily calculated indices like TS and TC descriptors, if they give good models in many areas, could find wide applications in the in silico evaluation of chemicals.…”

Section: -Albert Einsteinmentioning

confidence: 99%

See 1 more Smart Citation

Use of Graph Invariants in Quantitative Structure-Activity Relationship Studies

Basak¹

2016

Croat. Chem. Acta

Self Cite

View full text Add to dashboard Cite

This chapter reviews results of research carried out by Basak and collaborators during the past four decades or so in the development of novel mathematical chemodescriptors and their applications in quantitative structure-activity relationship (QSAR) studies related to the prediction of toxicities and bioactivities of chemicals. For chemodescriptors based QSAR studies, we have used graph theoretical, three dimensional (3-D), and quantum chemical indices. The graph theoretic chemodescriptors fall into two major categories: (a) Numerical invariants defined on simple molecular graphs representing only the adjacency and distance relationship of atoms and bonds; such invariants are called topostructural (TS) indices; (b) Topological indices derived from weighted molecular graphs, called topochemical (TC) indices. Collectively, the TS and TC descriptors are known as topological indices (TIs). The set of independent variables used for modeling also includes a group of threedimensional (3-D) molecular descriptors. Semi-empirical and various levels of ab initio quantum chemical indices have also been used for hierarchical QSAR (HiQSAR) modeling. Results indicate that in many cases of property / activity / toxicity analyzed by us, a TS + TC combination explains most of the variance in the data.

show abstract

“…The stability of the models was tested by cross-validation with two and five groups ( Table 1). As described previously, 26 the crossvalidation procedure provides a reliable picture of the predictivity of QSAR models. All the statistical values obtained from our current CoMFA and CoMSIA models Table 1).…”

Section: Statistics Of Comfa and Comsia Modelsmentioning

confidence: 99%

Identification of inhibitors of the nicotine metabolising CYP2A6 enzyme—an in silico approach

Rahnasto

Wittekindt

et al. 2007

Pharmacogenomics J

View full text Add to dashboard Cite

The compulsive nature of tobacco use is attributable to nicotine addiction. Nicotine is eliminated by metabolism through the cytochrome P450 2A6 (CYP2A6) enzyme in liver. Inhibition of CYP2A6 by chemical compounds may represent a potential supplement to anti-smoking therapy. The purpose of this study was to rationally design potent inhibitors of CYP2A6. 3D-QSAR models were constructed to find out which structural characteristics are important for inhibition potency. Specifically located hydrophobic and hydrogen donor features were found to affect inhibition potency. These features were used in virtual screening of over 60 000 compounds in the Maybridge chemical database. A total of 22 candidate molecules were selected and tested for inhibition potency. Four of these were potent and selective CYP2A6 inhibitors with IC 50 values lower than 1 mM. They represent novel structures of CYP2A6 inhibitors, especially N1-(4-fluorophenyl)cyclopropane-1-carboxamide. This compound can be used as a lead in the design of CYP2A6 inhibitor drugs to combat nicotine addiction.

show abstract

Assessing Model Fit by Cross-Validation

Cited by 661 publications

References 30 publications

A new computer program for QSAR‐analysis: ARTE‐QSAR

A new computer program for QSAR‐analysis: ARTE‐QSAR

Use of Graph Invariants in Quantitative Structure-Activity Relationship Studies

Identification of inhibitors of the nicotine metabolising CYP2A6 enzyme—an in silico approach

Contact Info

Product

Resources

About