Impact of Regressand Stratification in Dataset Shift Caused by Cross-Validation

Sáez, Juan Manuel; Romero‐Béjar, José L.

doi:10.3390/math10142538

Cited by 2 publications

(2 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even with simulated data, determining the ideal bias threshold and the precise accuracy of these models remains unattainable. The aforementioned constraints have been extensively reviewed by Sáez & Romero-Béjar (2022) and West et al, (2020). The only available option is selecting the model that presents better results through a combined validation approach.…”

Section: Model Validationmentioning

confidence: 99%

A model for eliminating aggregate and specification bias in multivariate fine-scale urban scenarios

Salehi,

Beni,

Halabian

et al. 2024

Preprint

View full text Add to dashboard Cite

The spatial relationships between predictors and responses are influenced by their frequency and spatial distribution. Ecological bias in regression models can occur due to the aggregate frequency and clustering of independent variables, leading to false, over-, or underestimations. This can be exacerbated by an increase in data resolution, complexity, and variable count, as is often the case in urban research scenarios. To address this issue, a new relationship-estimation model called the Ecologically Corrected Spatial Relationship Estimator (ECSRE) was proposed and compared to Geographically Weighted Regression (GWR). The results showed that ECSRE outperformed GWR by correctly revealing pre-planned relationships in simulated data, presenting a lower influence of aggregate frequencies on the outcome, better suppression of specification errors, higher R2 scores, and better randomness of residuals.

show abstract

Section: Model Validationmentioning

confidence: 99%

A model for eliminating aggregate and specification bias in multivariate fine-scale urban scenarios

Salehi,

Beni,

Halabian

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…Cross-validation is a resampling method used to evaluate machine learning models, and K-fold means that a given data is spilt into K separate folds. One-fold is used to train the model, and K-1 folds are used to validate, and then an individual estimation is obtained by averaging the results of K evaluations [28]. The model can be trained and validated on each fold data, increasing the model's fitness.…”

Section: Adaptive Ensemble Learning Framework For Renewable Energy Fo...mentioning

confidence: 99%

An Adaptive, Data-Driven Stacking Ensemble Learning Framework for the Short-Term Forecasting of Renewable Energy Generation

et al. 2023

View full text Add to dashboard Cite

With the increasing integration of wind and photovoltaic power, the security and stability of the power system operations are greatly influenced by the intermittency and fluctuation of these renewable sources of energy generation. The accurate and reliable short-term forecasting of renewable energy generation can effectively reduce the impacts of uncertainty on the power system. In this paper, we propose an adaptive, data-driven stacking ensemble learning framework for the short-term output power forecasting of renewable energy. Five base-models are adaptively selected via the determination coefficient (R2) indices from twelve candidate models. Then, cross-validation is used to increase the data diversity, and Bayesian optimization is used to tune hyperparameters. Finally, base modes with different weights determined by minimizing the cross-validation error are ensembled using a linear model. Four datasets in different seasons from wind farms and photovoltaic power stations are used to verify the proposed model. The results illustrate that the proposed stacking ensemble learning model for renewable energy power forecasting can adapt to dynamic changes in data and has better prediction precision and a stronger generalization performance compared to the benchmark models.

show abstract

Impact of Regressand Stratification in Dataset Shift Caused by Cross-Validation

Cited by 2 publications

References 40 publications

A model for eliminating aggregate and specification bias in multivariate fine-scale urban scenarios

A model for eliminating aggregate and specification bias in multivariate fine-scale urban scenarios

An Adaptive, Data-Driven Stacking Ensemble Learning Framework for the Short-Term Forecasting of Renewable Energy Generation

Contact Info

Product

Resources

About