2002
DOI: 10.1073/pnas.102102699
|View full text |Cite
|
Sign up to set email alerts
|

Selection bias in gene extraction on the basis of microarray gene-expression data

Abstract: In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leav… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

9
923
0
7

Year Published

2003
2003
2015
2015

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 1,227 publications
(949 citation statements)
references
References 25 publications
9
923
0
7
Order By: Relevance
“…Thus, the features were ranked 10 times, each time independent of the hold-out samples for that given trial. This ensured that the estimated generalization performance is unbiased by spurious features that might explain the training class labels but might not generalize to the population (Ambroise and McLachlan, 2002).…”
Section: Machine Learning Methods and Analysismentioning
confidence: 99%
“…Thus, the features were ranked 10 times, each time independent of the hold-out samples for that given trial. This ensured that the estimated generalization performance is unbiased by spurious features that might explain the training class labels but might not generalize to the population (Ambroise and McLachlan, 2002).…”
Section: Machine Learning Methods and Analysismentioning
confidence: 99%
“…But, if it is calculated within the feature selection process, there is a selection bias in it when it is used as an estimate of the prediction error (Ambroise and McLachlan, 2002). External cross-validation should be undertaken subsequent to the feature selection process to correct for this selection bias.…”
Section: Principal Component Of Cortical Thickness As a Feature For Pmentioning
confidence: 99%
“…However, these tools often present errors related to gene selection bias (Ambroise and McLachlan, 2002) that can lead to severe underestimations of prediction errors. We used (Medina et al, 2007), a tool unique in its kind, which implements a cross-validation strategy that renders unbiased error estimations.…”
Section: Identification Of a Molecular Prognosis Signature For Thyroimentioning
confidence: 99%