Is cross-validation valid for small-sample microarray classification?

Braga-Neto, Ulisses; Dougherty, Edward R.

doi:10.1093/bioinformatics/btg419

Cited by 535 publications

(356 citation statements)

References 9 publications

Supporting

Mentioning

344

Contrasting

Unclassified

Order By: Relevance

“…In addition, this method is well-known to have low bias. On the other hand, methods such as k-fold CV and bootstrap resampling techniques have been asserted to have smaller variance (see, e.g., Efron (1983); Efron and Tibshirani (1997)) and be more appropriate for microarray analysis in many cases (Braga-Neto and Dougherty (2004)). For instance, with 10-fold CV, the estimated error rates should be unbiased for a training set of size .9n (rather than of size n) although sensitivity to the training set may be smaller than with n-fold CV.…”

Section: Discussionmentioning

confidence: 99%

Classifying Gene Expression Profiles from Pairwise mRNA Comparisons

Geman

d'Avignon

Naiman

et al. 2004

Statistical Applications in Genetics and Molecular Biology

310

326

View full text Add to dashboard Cite

We present a new approach to molecular classification based on mRNA comparisons. Our method, referred to as the top-scoring pair(s) (TSP) classifier, is motivated by current technical and practical limitations in using gene expression microarray data for class prediction, for example to detect disease, identify tumors or predict treatment response. Accurate statistical inference from such data is difficult due to the small number of observations, typically tens, relative to the large number of genes, typically thousands. Moreover, conventional methods from machine learning lead to decisions which are usually very difficult to interpret in simple or biologically meaningful terms. In contrast, the TSP classifier provides decision rules which i) involve very few genes and only relative expression values (e.g., comparing the mRNA counts within a single pair of genes); ii) are both accurate and transparent; and iii) provide specific hypotheses for follow-up studies. In particular, the TSP classifier achieves prediction rates with standard cancer data that are as high as those of previous studies which use considerably more genes and complex procedures. Finally, the TSP classifier is parameter-free, thus avoiding the type of over-fitting and inflated estimates of performance that result when all aspects of learning a predictor are not properly cross-validated.

show abstract

Section: Discussionmentioning

confidence: 99%

Classifying Gene Expression Profiles from Pairwise mRNA Comparisons

Geman

d'Avignon

Naiman

et al. 2004

Statistical Applications in Genetics and Molecular Biology

310

326

View full text Add to dashboard Cite

show abstract

“…Consequently, the prediction rule is different each time and therefore not the same as the prediction rule developed on the entire sample whose performance one actually wants to evaluate. The instability is even worse in small sample settings (Braga-Neto and Dougherty, 2004). Thus, splitting the original sample in many ways is a first step in the right direction, but is not an independent validation, which is the only way to evaluate the performances of the prediction rule developed from the entire sample.…”

Section: Box 1 a Critical View Of Microarray Vocabularymentioning

confidence: 99%

Interpretation of microarray data in cancer

2007

View full text Add to dashboard Cite

show abstract

“…In order to guarantee that the present results are valid, we use the fivefold Cross Validation (5-CV) [39] to evaluate the classification accuracy. The feature data (positive 115, negative 85) are divided into five groups randomly, every time one group is chosen as testing set and the other four groups are used to forming training set; so looping five times, we can compute the average classification accuracy.…”

Section: Classification Tested On Ddsm Databasementioning

confidence: 99%

An Efficient Approach for Automated Mass Segmentation and Classification in Mammograms

Dong

Lǚ

et al. 2015

J Digit Imaging

View full text Add to dashboard Cite

Breast cancer is becoming a leading death of women all over the world; clinical experiments demonstrate that early detection and accurate diagnosis can increase the potential of treatment. In order to improve the breast cancer diagnosis precision, this paper presents a novel automated segmentation and classification method for mammograms. We conduct the experiment on both DDSM database and MIAS database, firstly extract the region of interests (ROIs) with chain codes and using the rough set (RS) method to enhance the ROIs, secondly segment the mass region from the location ROIs with an improved vector field convolution (VFC) snake and following extract features from the mass region and its surroundings, and then establish features database with 32 dimensions; finally, these features are used as input to several classification techniques. In our work, the random forest is used and compared with support vector machine (SVM), genetic algorithm support vector machine (GA-SVM), particle swarm optimization support vector machine (PSO-SVM), and decision tree. The effectiveness of our method is evaluated by a comprehensive and objective evaluation system; also, Matthew's correlation coefficient (MCC) indicator is used. Among the state-of-the-art classifiers, our method achieves the best performance with best accuracy of 97.73 %, and the MCC value reaches 0.8668 and 0.8652 in unique DDSM database and both two databases, respectively. Experimental results prove that the proposed method outperforms the other methods; it could consider applying in CAD systems to assist the physicians for breast cancer diagnosis.

show abstract

Is cross-validation valid for small-sample microarray classification?

Cited by 535 publications

References 9 publications

Classifying Gene Expression Profiles from Pairwise mRNA Comparisons

Classifying Gene Expression Profiles from Pairwise mRNA Comparisons

Interpretation of microarray data in cancer

An Efficient Approach for Automated Mass Segmentation and Classification in Mammograms

Contact Info

Product

Resources

About