2003
DOI: 10.1021/ci025626i
|View full text |Cite
|
Sign up to set email alerts
|

Assessing Model Fit by Cross-Validation

Abstract: When QSAR models are fitted, it is important to validate any fitted model-to check that it is plausible that its predictions will carry over to fresh data not used in the model fitting exercise. There are two standard ways of doing this-using a separate hold-out test sample and the computationally much more burdensome leave-one-out cross-validation in which the entire pool of available compounds is used both to fit the model and to assess its validity. We show by theoretical argument and empiric study of a lar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

7
450
0
4

Year Published

2007
2007
2018
2018

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 661 publications
(477 citation statements)
references
References 30 publications
(32 reference statements)
7
450
0
4
Order By: Relevance
“…The use of a large portion of the data for checking the model fit seems a waste of valuable and often costly information. In those cases where external validation is not possible, alternatives for predictive validation are of interest, for example, methods such as crossvalidation 18 and Y-randomization. Crossvalidation is a common technique where a number of modified data sets are created by deleting, in each case, one or a small group of compounds from the data.…”
Section: Measure Of Predictivitymentioning
confidence: 99%
“…The use of a large portion of the data for checking the model fit seems a waste of valuable and often costly information. In those cases where external validation is not possible, alternatives for predictive validation are of interest, for example, methods such as crossvalidation 18 and Y-randomization. Crossvalidation is a common technique where a number of modified data sets are created by deleting, in each case, one or a small group of compounds from the data.…”
Section: Measure Of Predictivitymentioning
confidence: 99%
“…[27] One also needs to check the ability of QSAR models in providing competent predictions on 'similar' datasets via validation on out-of-sample test sets. [28][29][30][31][32] For a relatively small sample, i.e., a small collection of compounds, this is done by following a leave-one-out (LOO) cross-validation method. For data sets with a large number of compounds, a more computationally economical way is to do a k-fold cross-validation: split the data set randomly into k (previously decided) equal subsets, take each subset in turn as test set and use the remaining compounds as training sets and use the model to obtain predictions.…”
Section: Statistical Methods For Qsar Model Development and Validationmentioning
confidence: 99%
“…Essentially this method ends up using information from the holdout compound/ split subset to predict activity of those very samples. This naïve cross-validation procedure causes synthetic inflation of the cross-validated q 2 , hence compromises the predictive ability of the model [29][30][31][32] (Figure 3). A two-step approach (referred in Figure 3 as 'Two-deep CV') helps avoid this tricky situation.…”
Section: Statistical Methods For Qsar Model Development and Validationmentioning
confidence: 99%
See 1 more Smart Citation
“…The stability of the models was tested by cross-validation with two and five groups ( Table 1). As described previously, 26 the crossvalidation procedure provides a reliable picture of the predictivity of QSAR models. All the statistical values obtained from our current CoMFA and CoMSIA models Table 1).…”
Section: Statistics Of Comfa and Comsia Modelsmentioning
confidence: 99%