2008
DOI: 10.1186/1471-2105-9-360
|View full text |Cite
|
Sign up to set email alerts
|

The C1C2: A framework for simultaneous model selection and assessment

Abstract: Background: There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C 1 C 2 , for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. Since the number of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
14
0

Year Published

2012
2012
2016
2016

Publication Types

Select...
7

Relationship

4
3

Authors

Journals

citations
Cited by 12 publications
(16 citation statements)
references
References 37 publications
2
14
0
Order By: Relevance
“…One way to address this difficulty is to use a double loop cross‐validation technique, where all of the variable selection is carried out in the “inner” loop and the derived model then tested against the left‐out test set in the “outer” loop of the procedure. This has been shown by Stone ,13 Eklund et al 14. and Freyhult et al 15.…”
Section: Introductionsupporting
confidence: 52%
“…One way to address this difficulty is to use a double loop cross‐validation technique, where all of the variable selection is carried out in the “inner” loop and the derived model then tested against the left‐out test set in the “outer” loop of the procedure. This has been shown by Stone ,13 Eklund et al 14. and Freyhult et al 15.…”
Section: Introductionsupporting
confidence: 52%
“…These results align very well with those presented in [17], where different variable selection and regression coefficient shrinkage methods were applied to the prostate data. The results presented in Figure 4(b) show clearly what is previously known (see [18,19]) about the importance of the six variables in the truncated Selwood data set.…”
Section: Applications Of Simsel To Real Data Setssupporting
confidence: 75%
“…The dependent variable is the anti-filarial activity of the molecules measured in vitro. Both data sets have been extensively studied by others (see, for instance, respectively, [17] and [18,19], and references therein) and are well characterized in terms of which independent variables are relevant for modelling the response variable. Since SimSel was derived under the assumption that (H T H) −1 exists and the Selwood data contain more variables than observations, we used only six independent variables from the Selwood data: LOGP, SUM_F, and MOFI_Y known to be important and ATCH10, PEAX_Y, and S8_1DX known not to be important (see [16] for details about the variables and [18,19] for discussions about important variables in the Selwood data).…”
Section: Applications Of Simsel To Real Data Setsmentioning
confidence: 99%
“…Equation 14 is not needed for regression, and eq 13 is changed so that there is only a single k.) The parameters were estimated (15) where N k (x) is the neighborhood defined by the k closest observations in the training set. We here use the Euclidean distance to determine "closeness".…”
Section: ■ Methodsmentioning
confidence: 99%