2004
DOI: 10.1093/bioinformatics/btg419
|View full text |Cite
|
Sign up to set email alerts
|

Is cross-validation valid for small-sample microarray classification?

Abstract: An extensive simulation study has been performed comparing cross-validation, resubstitution and bootstrap estimation for three popular classification rules-linear discriminant analysis, 3-nearest-neighbor and decision trees (CART)-using both synthetic and real breast-cancer patient data. Comparison is via the distribution of differences between the estimated and true errors. Various statistics for the deviation distribution have been computed: mean (for estimator bias), variance (for estimator precision), root… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

8
344
0
4

Year Published

2004
2004
2015
2015

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 535 publications
(356 citation statements)
references
References 9 publications
8
344
0
4
Order By: Relevance
“…In addition, this method is well-known to have low bias. On the other hand, methods such as k-fold CV and bootstrap resampling techniques have been asserted to have smaller variance (see, e.g., Efron (1983); Efron and Tibshirani (1997)) and be more appropriate for microarray analysis in many cases (Braga-Neto and Dougherty (2004)). For instance, with 10-fold CV, the estimated error rates should be unbiased for a training set of size .9n (rather than of size n) although sensitivity to the training set may be smaller than with n-fold CV.…”
Section: Discussionmentioning
confidence: 99%
“…In addition, this method is well-known to have low bias. On the other hand, methods such as k-fold CV and bootstrap resampling techniques have been asserted to have smaller variance (see, e.g., Efron (1983); Efron and Tibshirani (1997)) and be more appropriate for microarray analysis in many cases (Braga-Neto and Dougherty (2004)). For instance, with 10-fold CV, the estimated error rates should be unbiased for a training set of size .9n (rather than of size n) although sensitivity to the training set may be smaller than with n-fold CV.…”
Section: Discussionmentioning
confidence: 99%
“…Consequently, the prediction rule is different each time and therefore not the same as the prediction rule developed on the entire sample whose performance one actually wants to evaluate. The instability is even worse in small sample settings (Braga-Neto and Dougherty, 2004). Thus, splitting the original sample in many ways is a first step in the right direction, but is not an independent validation, which is the only way to evaluate the performances of the prediction rule developed from the entire sample.…”
Section: Box 1 a Critical View Of Microarray Vocabularymentioning
confidence: 99%
“…In order to guarantee that the present results are valid, we use the fivefold Cross Validation (5-CV) [39] to evaluate the classification accuracy. The feature data (positive 115, negative 85) are divided into five groups randomly, every time one group is chosen as testing set and the other four groups are used to forming training set; so looping five times, we can compute the average classification accuracy.…”
Section: Classification Tested On Ddsm Databasementioning
confidence: 99%