We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models to potentially high-dimensional data. We give a theoretical background and demonstrate how mboost can be used to fit interpretable models of different complexity.As an example we use mboost to predict the body fat based on anthropometric measurements throughout the tutorial.
Many of the popular nonlinear time series models require a priori the choice of parametric functions which are assumed to be appropriate in specific applications. This approach is used mainly in financial applications, when sufficient knowledge is available about the nonlinear structure between the covariates and the response. One principal strategy to investigate a broader class on nonlinear time series is the Nonlinear Additive AutoRegressive (NAAR) model. The NAAR model estimates the lags of a time series as flexible functions in order to detect nonmonotone relationships between current observations and past values. We consider linear and additive models for identifying nonlinear relationships. A componentwise boosting algorithm is applied to simultaneous model fitting, variable selection, and model choice. Thus, with the application of boosting for fitting potentially nonlinear models we address the major issues in time series modelling: lag selection and nonlinearity. By means of simulation we compare the outcomes of boosting to the outcomes obtained through alternative nonparametric methods. Boosting shows an overall strong performance in terms of precise estimations of highly nonlinear lag functions. The forecasting potential of boosting is examined on real data where the target variable is the German industrial production (IP). In order to improve the model's forecasting quality we include additional exogenous variables. Thus we address the second major aspect in this paper which concerns the issue of high-dimensionality in models. Allowing additional inputs in the model extends the NAAR model to an even broader class of models, namely the NAARX model. We show that boosting can cope with large models which have many covariates compared to the number of observations.
Different studies provide a surprisingly large variety of controversial conclusions about the forecasting power of an indicator, even when it is supposed to forecast the same time series. In this study we aim to provide a thorough overview of linear forecasting techniques and draw conclusions useful for the identification of the predictive relationship between leading indicators and time series. In a case study for Germany we forecast two possible representations of industrial production. Further on we consider a large variety of time-varying specifications. In a horse race with nine leading indicators plus an AR benchmark model we demonstrate the variance of assessment across target variables and forecasting settings (50 per horizon). We show that it is nearly always possible to find situations in which one indicator proved to have better predicting power compared to another. Nevertheless, the freedom of choice can be useful to identify robust leading indicators.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.