2011
DOI: 10.1186/1755-8794-4-31
|View full text |Cite
|
Sign up to set email alerts
|

Optimally splitting cases for training and testing high dimensional classifiers

Abstract: BackgroundWe consider the problem of designing a study to develop a predictive classifier from high dimensional data. A common study design is to split the sample into a training set and an independent test set, where the former is used to develop the classifier and the latter to evaluate its performance. In this paper we address the question of what proportion of the samples should be devoted to the training set. How does this proportion impact the mean squared error (MSE) of the prediction accuracy estimate?… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
165
0
6

Year Published

2015
2015
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 283 publications
(172 citation statements)
references
References 22 publications
1
165
0
6
Order By: Relevance
“…However, the question whether the frequently applied rule-of-thumb of using 2/3 of the data for training and the remaining 1/3 for testing purposes (Cios et al, 2007;Dobbin & Simon, 2011) makes sense and whether its applicability depends on the sample size often remains an open question. Because of this, we stated two research questions: First, does the popular 2/3 rule-ofthumb splitting criterion used in out-of-sample tests generally make sense?…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…However, the question whether the frequently applied rule-of-thumb of using 2/3 of the data for training and the remaining 1/3 for testing purposes (Cios et al, 2007;Dobbin & Simon, 2011) makes sense and whether its applicability depends on the sample size often remains an open question. Because of this, we stated two research questions: First, does the popular 2/3 rule-ofthumb splitting criterion used in out-of-sample tests generally make sense?…”
Section: Discussionmentioning
confidence: 99%
“…Thus, these sub-samples are often referred to as the training-and test samples. However, how could someone decide between -say splitting the data sample into equally sized sub-samples and the oftentimes used 2/3 rule-of-thumb (Cios, Pedrycz, Swiniarski, & Kurgan, 2007;Dobbin & Simon, 2011) of using 2/3 of the data for training and the remaining 1/3 for testing purposes? Thus, we state our two research questions as follows:…”
Section: ª Conferência Da Associação Portuguesa De Sistemas De Infmentioning
confidence: 99%
See 1 more Smart Citation
“…A training set that has about 70% of the dataset objects, and a learning and test set has about 30% of the dataset objects [44]. The training set is used by FS approaches to achieve features reduction.…”
Section: B Evaluation Methodsmentioning
confidence: 99%
“…The number of rib fractures was recorded from radiologists' reports, and the fractures themselves could have been misclassified, given that radiographic diagnosis of rib fracture may be imperfect. 43 Lastly, because the random assignment of participants to the derivation or validation set may have improved the performance of our model, 29,30 future external validation could show different classification performance.…”
Section: Limitationsmentioning
confidence: 99%