NGBoost: Natural Gradient Boosting for Probabilistic Prediction

Duan, Tony; Avati, Anand; Ding, Daisy Yi; Thai, Khanh K.; Basu, Sanjay; Ng, Andrew Y.; Schuler, Alejandro

doi:10.48550/arxiv.1910.03225

Cited by 12 publications

(17 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our aim is to learn a data-driven mapping from the set of all x i s to the set of all y i s, to be able to predict y new for a new, unseen observation x new . To solve this regression problem 5 , we employ the NGBoost algorithm (Duan et al 2019). Unlike most commonly used ML algorithms and libraries such as Random Forests (Breiman 2001), Randomized Trees (Geurts et al 2006), XGBoost (Chen & Guestrin 2016) and Gradient Boosting Machines (Ke et al 2017), NGBoost enables us to easily work in a probabilistic setting, and corresponding to every input galaxy SED, output both a measure of the central tendency (i.e.…”

Section: Proposed Methodsmentioning

confidence: 99%

See 1 more Smart Citation

{\sc mirkwood:} Fast and Accurate SED Modeling Using Machine Learning

Gilda,

Lower,

Narayanan

2021

Preprint

View full text Add to dashboard Cite

Traditional spectral energy distribution (SED) fitting codes used to derive galaxy physical properties are often uncertain at the factor of a few level owing to uncertainties in galaxy star formation histories and dust attenuation curves. Beyond this, Bayesian fitting (which is typically used in SED fitting software) is an intrinsically compute-intensive task, often requiring access to expensive hardware for long periods of time. To overcome these shortcomings, we have developed mirkwood: a user-friendly tool comprising of an ensemble of supervised machine learning-based models capable of non-linearly mapping galaxy fluxes to their properties. By stacking multiple models, we marginalize against any individual model's poor performance in a given region of the parameter space. We demonstrate mirkwood's significantly improved performance over traditional techniques by training it on a combined data set of mock photometry of z=0 galaxies from the Simba, EAGLE and IllustrisTNG cosmological simulations, and comparing the derived results with those obtained from traditional SED fitting techniques. mirkwood is also able to account for uncertainties arising both from intrinsic noise in observations, and from finite training data and incorrect modeling assumptions. To increase the added value to the observational community, we use Shapley value explanations (SHAP) to fairly evaluate the relative importance of different bands to understand why particular predictions were reached. We envisage mirkwood to be an evolving, open-source framework that will provide highly accurate physical properties from observations of galaxies as compared to traditional SED fitting.

show abstract

Section: Proposed Methodsmentioning

confidence: 99%

“…We make the assumption that samples are drawn from Gaussian distributions. Our loss function is the negative likelihood function (Duan et al 2019).…”

Section: Proposed Methodsmentioning

confidence: 99%

{\sc mirkwood:} Fast and Accurate SED Modeling Using Machine Learning

Gilda,

Lower,

Narayanan

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Network), and a decision-tree based model using natural gradient boosting (NGBoost) assuming a Gaussian output distribution [24]. Hyperparameters are provided in the appendix.…”

Section: Methodsmentioning

confidence: 99%

Short-Term Solar Irradiance Forecasting Using Calibrated Probabilistic Models

Zelikman,

Zhou,

Irvin

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Advancing probabilistic solar forecasting methods is essential to supporting the integration of solar energy into the electricity grid. In this work, we develop a variety of state-of-the-art probabilistic models for forecasting solar irradiance. We investigate the use of post-hoc calibration techniques for ensuring well-calibrated probabilistic predictions. We train and evaluate the models using public data from seven stations in the SURFRAD network, and demonstrate that the best model, NGBoost, achieves higher performance at an intra-hourly resolution than the best benchmark solar irradiance forecasting model across all stations. Further, we show that NGBoost with CRUDE post-hoc calibration achieves comparable performance to a numerical weather prediction model on hourly-resolution forecasting.

show abstract

“…Our experiments use datasets from the UCI Machine Learning Repository, and follow the same protocol as NGBoost (Hernández-Lobato and Adams, 2015;Duan et al, 2019). For all datasets, we hold out a random 10% of the examples as a test set.…”

Section: Methodsmentioning

confidence: 99%

“…These implementations can train models with hundreds of trees using millions of training examples in a matter of minutes. NGBoost (Duan et al, 2019) generalized Natural Gradient as the direction of the steepest ascent in Riemannian space, and applied it for boosting to enable the probabilistic predication capability for the regression tasks. Natural gradient boosting shows promising performance improvements on small datasets due to better training dynamics, but it suffers from slow training speed overhead especially for large datasets.…”

Section: Introductionmentioning

confidence: 99%

RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping

Ren,

Sun,

2019

Preprint

View full text Add to dashboard Cite

Natural gradient has been recently introduced to the field of boosting to enable the generic probabilistic predication capability. Natural gradient boosting shows promising performance improvements on small datasets due to better training dynamics, but it suffers from slow training speed overhead especially for large datasets. We present a replication study of NGBoost (Duan et al., 2019) training that carefully examines the impacts of key hyperparameters under the circumstance of best-first decision tree learning. We find that with the regularization of leaf number clipping, the performance of NGBoost can be largely improved via a better choice of hyperparameters. Experiments show that our approach significantly beats the state-of-the-art performance on various kinds of datasets from the UCI Machine Learning Repository while still has up to 4.85x speed up compared with the original approach of NGBoost.

show abstract

NGBoost: Natural Gradient Boosting for Probabilistic Prediction

Cited by 12 publications

References 14 publications

{\sc mirkwood:} Fast and Accurate SED Modeling Using Machine Learning

{\sc mirkwood:} Fast and Accurate SED Modeling Using Machine Learning

Short-Term Solar Irradiance Forecasting Using Calibrated Probabilistic Models

RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping

Contact Info

Product

Resources

About