Benjamin Hofner scite author profile

We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models to potentially high-dimensional data. We give a theoretical background and demonstrate how mboost can be used to fit interpretable models of different complexity.As an example we use mboost to predict the body fat based on anthropometric measurements throughout the tutorial.

show abstract

Generalized Additive Models for Location, Scale and Shape for High Dimensional Data—A Flexible Approach Based on Boosting

Mayr

et al. 2012

View full text Add to dashboard Cite

Generalized additive models for location, scale and shape (GAMLSSs) are a popular semiparametric modelling approach that, in contrast with conventional generalized additive models, regress not only the expected mean but also every distribution parameter (e.g. location, scale and shape) to a set of covariates. Current fitting procedures for GAMLSSs are infeasible for high dimensional data set-ups and require variable selection based on (potentially problematic) information criteria. The present work describes a boosting algorithm for high dimensional GAMLSSs that was developed to overcome these limitations. Specifically, the new algorithm was designed to allow the simultaneous estimation of predictor effects and variable selection. The algorithm proposed was applied to Munich rental guide data, which are used by landlords and tenants as a reference for the average rent of a flat depending on its characteristics and spatial features. The net rent predictions that resulted from the high dimensional GAMLSSs were found to be highly competitive and covariate-specific prediction intervals showed a major improvement over classical generalized additive models.

show abstract

opm: an R package for analysing OmniLog® phenotype microarray data

et al. 2013

View full text Add to dashboard Cite

show abstract

Controlling false discoveries in high-dimensional situations: boosting with stability selection

2015

View full text Add to dashboard Cite

BackgroundModern biotechnologies often result in high-dimensional data sets with many more variables than observations (n≪p). These data sets pose new challenges to statistical analysis: Variable selection becomes one of the most important tasks in this setting. Similar challenges arise if in modern data sets from observational studies, e.g., in ecology, where flexible, non-linear models are fitted to high-dimensional data. We assess the recently proposed flexible framework for variable selection called stability selection. By the use of resampling procedures, stability selection adds a finite sample error control to high-dimensional variable selection procedures such as Lasso or boosting. We consider the combination of boosting and stability selection and present results from a detailed simulation study that provide insights into the usefulness of this combination. The interpretation of the used error bounds is elaborated and insights for practical data analysis are given.ResultsStability selection with boosting was able to detect influential predictors in high-dimensional settings while controlling the given error bound in various simulation scenarios. The dependence on various parameters such as the sample size, the number of truly influential variables or tuning parameters of the algorithm was investigated. The results were applied to investigate phenotype measurements in patients with autism spectrum disorders using a log-linear interaction model which was fitted by boosting. Stability selection identified five differentially expressed amino acid pathways.ConclusionStability selection is implemented in the freely available R package stabs (http://CRAN.R-project.org/package=stabs). It proved to work well in high-dimensional settings with more predictors than observations for both, linear and additive models. The original version of stability selection, which controls the per-family error rate, is quite conservative, though, this is much less the case for its improvement, complementary pairs stability selection. Nevertheless, care should be taken to appropriately specify the error bound.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0575-3) contains supplementary material, which is available to authorized users.

show abstract

Current Statistical Considerations and Regulatory Perspectives on the Planning of Confirmatory Basket, Umbrella, and Platform Trials

Collignon

Gärtner

Haidich

et al. 2020

Clin Pharma and Therapeutics

103

View full text Add to dashboard Cite

Master protocols have received a growing interest during the last years. By assigning patients to specific substudies, they aim at targeting and accelerating clinical development. Given their complexity, basket, umbrella, and platform designs have raised challenging regulatory and statistical questions, especially the control of multiplicity in confirmatory trials. In basket trials, regulatory assessment of the benefit/risk in pooled populations and choice of the treatment indication is challenging. We provide here our perspectives on these topics. In master protocols, as long as the statistical hypotheses tested between the different substudies are independent, no supplementary adjustment for multiplicity over the different substudies should be required. Moreover, sharing a control arm within an umbrella or a platform trial investigating different drugs would not require a correction for the type I error rate, whereas the chance of multiple false positive regulatory decisions should be recognized. In basket trials, pooling across substudies requires a rationale supporting the intended indication and should be preplanned. Assessment of the benefit/risk in pooled target populations can be complicated by differences in design or in efficacy/safety signals between the substudies. While trials governed by a master protocol can offer logistic and financial advantages, more experience is needed to gain a deeper insight into this novel framework.

show abstract

A Framework for Unbiased Model Selection Based on Boosting

Hofner

Hothorn

Kneib

et al. 2011

Journal of Computational and Graphical Statistics

View full text Add to dashboard Cite

Variable selection and model choice are of major concern in many statistical applications, especially in high-dimensional regression models. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection. We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure. We show that variable selection may be biased if the covariates are of different nature. Important examples are models combining continuous and categorical covariates, especially if the number of categories is large. In this case, least squares base-learners offer increased flexibility for the categorical covariate and lead to a preference even if the categorical covariate is noninformative. Similar difficulties arise when comparing linear and nonlinear base-learners for a continuous covariate. The additional flexibility in the nonlinear base-learner again yields a preference of the more complex modeling alternative. We investigate these problems from a theoretical perspective and suggest a framework for unbiased model selection based on a general class of penalized least squares base-learners. Making all base-learners comparable in terms of their degrees of freedom strongly reduces the selection bias observed in naive boosting specifications. The importance of unbiased model selection is demonstrated in simulations and an application to forest health models.

show abstract

Fracture Rates and Lifetime Estimations of CAD/CAM All-ceramic Restorations

et al. 2015

View full text Add to dashboard Cite

The gathering of clinical data on fractures of dental restorations through prospective clinical trials is a labor-and time-consuming enterprise. Here, we propose an unconventional approach for collecting large datasets, from which clinical information on indirect restorations can be retrospectively analyzed. The authors accessed the database of an industry-scale machining center in Germany and obtained information on 34,911 computer-aided design (CAD)/computer-aided manufacturing (CAM) all-ceramic posterior restorations. The fractures of bridges, crowns, onlays, and inlays fabricated from different all-ceramic systems over a period of 3.5 y were reported by dentists and entered in the database. Survival analyses and estimations of future life revealed differences in performance among ZrO 2 -based restorations and lithium disilicate and leucite-reinforced glass-ceramics.

show abstract

Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates

et al. 2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Benjamin Hofner

Model-based boosting in R: a hands-on tutorial using the R package mboost

Generalized Additive Models for Location, Scale and Shape for High Dimensional Data—A Flexible Approach Based on Boosting

opm: an R package for analysing OmniLog® phenotype microarray data

Controlling false discoveries in high-dimensional situations: boosting with stability selection

Current Statistical Considerations and Regulatory Perspectives on the Planning of Confirmatory Basket, Umbrella, and Platform Trials

A Framework for Unbiased Model Selection Based on Boosting

Fracture Rates and Lifetime Estimations of CAD/CAM All-ceramic Restorations

Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates

Contact Info

Product

Resources

About