Comparing quantile regression methods for probabilistic forecasting of NO2 pollution levels

Vasseur, Sebastien Pérez; Aznarte, José Luis

doi:10.1038/s41598-021-90063-3

Cited by 10 publications

(5 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Logarithmic transformations for the depend variable and quantile-based probabilistic models could be applied in the correction of the heteroscedasticity ( O’Sullivan et al. 2016 ; Tofallis 2009 ; Vasseur and Aznarte 2021 ). Furthermore, our DEML could not directly deal with missing values.…”

Section: Discussionmentioning

confidence: 99%

“…The biased prediction was expected because of the high variations in the retrieved PM 10 and PM 2:5 concentrations, especially in a certain season like summer when variable atmospheric conditions (Fratianni and Acquaotta 2017) and certain transient air pollution events such as Saharan dust appear frequently in Italy (Mallone et al 2011). Logarithmic transformations for the depend variable and quantilebased probabilistic models could be applied in the correction of the heteroscedasticity (O'Sullivan et al 2016;Tofallis 2009;Vasseur and Aznarte 2021). Furthermore, our DEML could not directly deal with missing values.…”

Section: Environmental Health Perspectivesmentioning

confidence: 99%

See 1 more Smart Citation

Deep Ensemble Machine Learning Framework for the Estimation of PM2.5 Concentrations

et al. 2022

Environ Health Perspect

View full text Add to dashboard Cite

Background: Accurate estimation of historical (particle matter with an aerodynamic diameter of less than ) is critical and essential for environmental health risk assessment. Objectives: The aim of this study was to develop a multiple-level stacked ensemble machine learning framework for improving the estimation of the daily ground-level concentrations. Methods: An innovative deep ensemble machine learning framework (DEML) was developed to estimate the daily concentrations. The framework has a three-stage structure: At the first stage, four base models [gradient boosting machine (GBM), support vector machine (SVM), random forest (RF), and eXtreme gradient boosting (XGBoost)] were used to generate a new data set of concentrations for training the next-stage learners. At the second stage, three meta-models [RF, XGBoost, and Generalized Linear Model (GLM)] were used to estimate concentrations using a combination of the original data set and the predictions from the first-stage models. At the third stage, a nonnegative least squares (NNLS) algorithm was employed to obtain the optimal weights for estimation. We took the data from 133 monitoring stations in Italy as an example to implement the DEML to predict daily at each grid cell from 2015 to 2019 across Italy. We evaluated the model performance by performing 10-fold cross-validation (CV) and compared it with five benchmark algorithms [GBM, SVM, RF, XGBoost, and Super Learner (SL)]. Results: The results revealed that the prediction performance of DEML [coefficients of determination and root mean square error ] was superior to any benchmark models (with of 0.51, 0.76, 0.83, 0.70, and 0.83 for GBM, SVM, RF, XGBoost, and SL approach, respectively). DEML displayed reliable performance in capturing the spatiotemporal variations of in Italy. Discussion: The proposed DEML framework achieved an outstanding performance in estimation, which could be used as a tool for more accurate environmental exposure assessment. https://doi.org/10.1289/EHP9752

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Environmental Health Perspectivesmentioning

confidence: 99%

Deep Ensemble Machine Learning Framework for the Estimation of PM2.5 Concentrations

et al. 2022

Environ Health Perspect

View full text Add to dashboard Cite

show abstract

“…We chose Gradient Boosting because it is fast (Natras et al., 2022a), performs well on structured input data even for relatively small data sets (Duan et al., 2020), and has proven to be a powerful method in many data science competitions (Chen & Guestrin, 2016). Moreover, Vasseur and Aznarte (2021) compared the performance of 10 ML algorithms with quantile loss for predicting NO 2 pollution and found that Gradient Boosting outperformed the other models with better results for all metrics examined.…”

Section: Methodsmentioning

confidence: 99%

Uncertainty Quantification for Machine Learning‐Based Ionosphere and Space Weather Forecasting: Ensemble, Bayesian Neural Network, and Quantile Gradient Boosting

Natras,

Soja,

Schmidt

2023

Space Weather

View full text Add to dashboard Cite

Machine learning (ML) has been increasingly applied to space weather and ionosphere problems in recent years, with the goal of improving modeling and forecasting capabilities through a data‐driven modeling approach of nonlinear relationships. However, little work has been done to quantify the uncertainty of the results, lacking an indication of how confident and reliable the results of an ML system are. In this paper, we implement and analyze several uncertainty quantification approaches for an ML‐based model to forecast Vertical Total Electron Content (VTEC) 1‐day ahead and corresponding uncertainties with 95% confidence intervals (CI): (a) Super‐Ensemble of ML‐based VTEC models (SE), (b) Gradient Tree Boosting with quantile loss function (Quantile Gradient Boosting, QGB), (c) Bayesian neural network (BNN), and (d) BNN including data uncertainty (BNN + D). Techniques that consider only model parameter uncertainties (a and c) predict narrow CI and over‐optimistic results, whereas accounting for both model parameter and data uncertainties with the BNN + D approach leads to a wider CI and the most realistic uncertainties quantification of VTEC forecast. However, the BNN + D approach suffers from a high computational burden, while the QGB approach is the most computationally efficient solution with slightly less realistic uncertainties. The QGB CI are determined to a large extent from space weather indices, as revealed by the feature analysis. They exhibit variations related to daytime/nightime, solar irradiance, geomagnetic activity, and post‐sunset low‐latitude ionosphere enhancement.

show abstract

“…Natural gradient boosting (NGBoost) is a recent method that uses boosting models for computing probabilistic predictions in regression problems [12,16,53]…”

Section: Natural Gradient Boostingmentioning

confidence: 99%

Deep neural networks for the quantile estimation of regional renewable energy production

2022

View full text Add to dashboard Cite

Wind and solar energy forecasting have become crucial for the inclusion of renewable energy in electrical power systems. Although most works have focused on point prediction, it is currently becoming important to also estimate the forecast uncertainty. With regard to forecasting methods, deep neural networks have shown good performance in many fields. However, the use of these networks for comparative studies of probabilistic forecasts of renewable energies, especially for regional forecasts, has not yet received much attention. The aim of this article is to study the performance of deep networks for estimating multiple conditional quantiles on regional renewable electricity production and compare them with widely used quantile regression methods such as the linear, support vector quantile regression, gradient boosting quantile regression, natural gradient boosting and quantile regression forest methods. A grid of numerical weather prediction variables covers the region of interest. These variables act as the predictors of the regional model. In addition to quantiles, prediction intervals are also constructed, and the models are evaluated using different metrics. These prediction intervals are further improved through an adapted conformalized quantile regression methodology. Overall, the results show that deep networks are the best performing method for both solar and wind energy regions, producing narrow prediction intervals with good coverage.

show abstract

Comparing quantile regression methods for probabilistic forecasting of NO2 pollution levels

Cited by 10 publications

References 21 publications

Deep Ensemble Machine Learning Framework for the Estimation of PM2.5 Concentrations

Deep Ensemble Machine Learning Framework for the Estimation of PM2.5 Concentrations

Uncertainty Quantification for Machine Learning‐Based Ionosphere and Space Weather Forecasting: Ensemble, Bayesian Neural Network, and Quantile Gradient Boosting

Deep neural networks for the quantile estimation of regional renewable energy production

Contact Info

Product

Resources

About