Small-sample precision of ROC-related estimates

Hanczar, Blaise; Jiang, Hua; Sima, Chao; Weinstein, John N.; Bittner, Michael; Dougherty, Edward R.

doi:10.1093/bioinformatics/btq037

Cited by 268 publications

(153 citation statements)

References 16 publications

Supporting

Mentioning

143

Contrasting

Order By: Relevance

“…Hanczar et al [8] show that the variance in ROC curves computed over small data sets can significantly impact scientific conclusions. Bouckaert and Frank [9] study the consistency of statistical tests on individual data sets and recommend a corrected t-test [10] across ten iterations of ten-fold cross-validation as the least sensitive to the order of the data.…”

Section: Introductionmentioning

confidence: 99%

Consequences of Variability in Classifier Performance Estimates

Raeder

Hoens

Chawla

2010

2010 IEEE International Conference on Data Mining

View full text Add to dashboard Cite

Abstract-The prevailing approach to evaluating classifiers in the machine learning community involves comparing the performance of several algorithms over a series of usually unrelated data sets. However, beyond this there are many dimensions along which methodologies vary wildly. We show that, depending on the stability and similarity of the algorithms being compared, these sometimes-arbitrary methodological choices can have a significant impact on the conclusions of any study, including the results of statistical tests. In particular, we show that performance metrics and data sets used, the type of cross-validation employed, and the number of iterations of cross-validation run have a significant, and often predictable, effect. Based on these results, we offer a series of recommendations for achieving consistent, reproducible results in classifier performance comparisons.

show abstract

Section: Introductionmentioning

confidence: 99%

Consequences of Variability in Classifier Performance Estimates

Raeder

Hoens

Chawla

2010

2010 IEEE International Conference on Data Mining

View full text Add to dashboard Cite

show abstract

“…For C4.5, the optimised parameter was the confidence factor (CF) that configures its error-based pruning method, varying it within its boundary values [0, 0.5], in steps of 0.05. For CART, the optimised parameter was the number of folds in the crossvalidation procedure that is executed within the cost-complexity pruning method (ranging within [2,20]). For REPTree, the optimised parameter was the size of the 1.00…”

Section: Methodsmentioning

confidence: 99%

“…The machine learning community most often uses the AUC statistic for model comparison, even though this practice has recently been questioned based upon new research that shows that AUC is quite noisy as a performance measure for classification [20] and has some other significant problems in model comparison [21,23].…”

Section: Aucmentioning

confidence: 99%

Investigating fitness functions for a hyper-heuristic evolutionary algorithm in the context of balanced and imbalanced data classification

Barros

Basgalupp

Carvalho

2014

Genet Program Evolvable Mach

View full text Add to dashboard Cite

In this paper, we analyse in detail the impact of different strategies to be used as fitness function during the evolutionary cycle of a hyper-heuristic evolutionary algorithm that automatically designs decision-tree induction algorithms (HEAD-DT). We divide the experimental scheme into two distinct scenarios:(1) evolving a decision-tree induction algorithm from multiple balanced data sets; and (2) evolving a decision-tree induction algorithm from multiple imbalanced data sets. In each of these scenarios, we analyse the difference in performance of wellknown classification performance measures such as accuracy, F-Measure, AUC, recall, and also a lesser-known criterion, namely the relative accuracy improvement. In addition, we analyse different schemes of aggregation, such as simple average, median, and harmonic mean. Finally, we verify whether the best-performing fitness functions are capable of providing HEAD-DT with algorithms more effective than traditional decision-tree induction algorithms like C4.5, CART, and REPTree. Experimental results indicate that HEAD-DT is a good option for generating algorithms tailored to (im)balanced data, since it outperforms state-of-the-art decision-tree induction algorithms with statistical significance.

show abstract

“…ROC analysis is now an integral part of the evaluation of machine learning algorithms (Bradley, 1997). Whereas ROC curves are widely (and rightly so) considered useful, both theoretical and practical shortcomings of the AUC have been pointed out (Hilden, 1991;Adams & Hand, 1999;Bengio, Mariéthoz, & Keller, 2005;Webb & Ting, 2005;Lobo et al, 2008;Hand, 2009;Hanczar, Hua, Sima, Weinstein, Bittner, & Dougherty, 2010;Hand & Anagnostopoulos, 2013;Parker, 2013). A particular problem of the AUC is that it can be incoherent, in the sense that it assumes different cost distributions for different classifiers (Hand, 2009).…”

Section: Area Under the Roc Curve (Auc)mentioning

confidence: 99%

An Empirical Evaluation of Ranking Measures With Respect to Robustness to Noise

Berrar¹

2014

jair

View full text Add to dashboard Cite

Ranking measures play an important role in model evaluation and selection. Using both synthetic and real-world data sets, we investigate how different types and levels of noise affect the area under the ROC curve (AUC), the area under the ROC convex hull, the scored AUC, the Kolmogorov-Smirnov statistic, and the H-measure. In our experiments, the AUC was, overall, the most robust among these measures, thereby reinvigorating it as a reliable metric despite its well-known deficiencies. This paper also introduces a novel ranking measure, which is remarkably robust to noise yet conceptually simple.

show abstract

Small-sample precision of ROC-related estimates

Abstract: edward@mail.ece.tamu.edu.

Cited by 268 publications

References 16 publications

Consequences of Variability in Classifier Performance Estimates

Consequences of Variability in Classifier Performance Estimates

Investigating fitness functions for a hyper-heuristic evolutionary algorithm in the context of balanced and imbalanced data classification

An Empirical Evaluation of Ranking Measures With Respect to Robustness to Noise

Contact Info

Product

Resources

About