2010
DOI: 10.1109/tpami.2009.187
|View full text |Cite
|
Sign up to set email alerts
|

Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation

Abstract: In the machine learning field, the performance of a classifier is usually measured in terms of prediction error. In most real-world problems, the error cannot be exactly calculated and it must be estimated. Therefore, it is important to choose an appropriate estimator of the error. This paper analyzes the statistical properties, bias and variance, of the kappa-fold cross-validation classification error estimator (kappa-cv). Our main contribution is a novel theoretical decomposition of the variance of the kappa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

3
656
1
15

Year Published

2012
2012
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 1,425 publications
(675 citation statements)
references
References 9 publications
3
656
1
15
Order By: Relevance
“…Each set is used exactly once as the test set while the remaining data is used as the training set (Rodríguez et al, 2010). Based on the fivefold cross-validation method to examine the prediction models produced by the above algorithms, the whole database was randomly divided into ten distinct parts because of the amount of test data is 10% of the whole database (Lin et al, 2006).…”
Section: Evaluation Of the Predictive Performancementioning
confidence: 99%
“…Each set is used exactly once as the test set while the remaining data is used as the training set (Rodríguez et al, 2010). Based on the fivefold cross-validation method to examine the prediction models produced by the above algorithms, the whole database was randomly divided into ten distinct parts because of the amount of test data is 10% of the whole database (Lin et al, 2006).…”
Section: Evaluation Of the Predictive Performancementioning
confidence: 99%
“…Therefore, an optimal process based on a genetic algorithm (GA) is used to identify the best parameter values. This optimization uses the accuracy of the training dataset as the fitness function, and applies K-fold cross-validation 21 to analyze the variable generalization ability of each generation. The program flow of the GA used in the proposed method is shown in Fig.…”
Section: Optimization Of Svm Parametersmentioning
confidence: 99%
“…The repeated r times k-cv consists of estimating the error as the average of r k-cv estimations with different random partitions into folds. This method considerably reduces the variance of the error estimation [41].…”
Section: Multi-dimensional Classification Evaluationmentioning
confidence: 99%
“…The classification accuracies, which have been estimated via 20 runs of 5-fold non-stratified cross validation (20×5cv) [41], are shown in Table 6. The results of using the four different feature sets (unigrams, unigrams + bigrams, PoS, and the ASOMO features) in conjunction with 20 the three different learning approaches (multiple uni-dimensional, Cartesian class variable, and multi-dimensional classifiers) in a supervised framework are shown.…”
mentioning
confidence: 99%
See 1 more Smart Citation