2022
DOI: 10.3390/math10142538
|View full text |Cite
|
Sign up to set email alerts
|

Impact of Regressand Stratification in Dataset Shift Caused by Cross-Validation

Abstract: Data that have not been modeled cannot be correctly predicted. Under this assumption, this research studies how k-fold cross-validation can introduce dataset shift in regression problems. This fact implies data distributions in the training and test sets to be different and, therefore, a deterioration of the model performance estimation. Even though the stratification of the output variable is widely used in the field of classification to reduce the impacts of dataset shift induced by cross-validation, its use… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 40 publications
0
1
0
Order By: Relevance
“…Even with simulated data, determining the ideal bias threshold and the precise accuracy of these models remains unattainable. The aforementioned constraints have been extensively reviewed by Sáez & Romero-Béjar (2022) and West et al, (2020). The only available option is selecting the model that presents better results through a combined validation approach.…”
Section: Model Validationmentioning
confidence: 99%
“…Even with simulated data, determining the ideal bias threshold and the precise accuracy of these models remains unattainable. The aforementioned constraints have been extensively reviewed by Sáez & Romero-Béjar (2022) and West et al, (2020). The only available option is selecting the model that presents better results through a combined validation approach.…”
Section: Model Validationmentioning
confidence: 99%
“…Cross-validation is a resampling method used to evaluate machine learning models, and K-fold means that a given data is spilt into K separate folds. One-fold is used to train the model, and K-1 folds are used to validate, and then an individual estimation is obtained by averaging the results of K evaluations [28]. The model can be trained and validated on each fold data, increasing the model's fitness.…”
Section: Adaptive Ensemble Learning Framework For Renewable Energy Fo...mentioning
confidence: 99%