This paper surveys various shrinkage, smoothing and selection priors from a unifying perspective and shows how to combine them for Bayesian regularisation in the general class of structured additive regression models. As a common feature, all regularisation priors are conditionally Gaussian, given further parameters regularising model complexity. Hyperpriors for these parameters encourage shrinkage, smoothness or selection. It is shown that these regularisation (log-) priors can be interpreted as Bayesian analogues of several well-known frequentist penalty terms. Inference can be carried out with unified and computationally efficient MCMC schemes, estimating regularised regression coefficients and basis function coefficients simultaneously with complexity parameters and measuring uncertainty via corresponding marginal posteriors. For variable and function selection we discuss several variants of spike and slab priors which can also be cast into the framework of conditionally Gaussian priors. The performance of the Bayesian regularisation approaches is demonstrated in a hazard regression model and a high-dimensional geoadditive regression model.
KeywordsConditionally Gaussian priors · lasso · MCMC · P-splines · Spike and slab prior · Structured additive regression
Basic concepts of Bayesian regularisationIn quite general terms, the notion of regularisation summarises approaches that allow to solve systems of equations Aβ ≈ a with respect to β if A is close to singular or even exactly singular. Hence, the purpose of regularisation is to introduce additional assumptions that allow to characterise useful solutions β. In statistical terms, a typical example are linear regression models y = Xβ + ε, ε ∼ N(0, σ 2 I ) where y denotes a vector of responses, β is a q-dimensional vector of regression coefficients associated with covariates collected in the design matrix X and ε is a vector of error terms. The classical least squares estimate minimises the squared L 2 norm y − Xβ 2 2 by solving the normal equations X Xβ = X y with respect to β. Therefore, A = X X and a = X y in the general regularisation notation. If the dimension of β is large (possibly larger than the sample size) or if some columns in X are close to collinear, solving the normal equations becomes numerically instable. A classical regularisation approach to overcome this difficulty is to add an L 2 -Tikhonov regularisation penalty to the optimisation problem, yieldingwhere λ > 0 is a regularisation parameter that determines the impact of the penalty term leading to the penalised least squares estimateβ = (X X + λI ) −1 X y with covariance matrix Cov(β) = σ 2 (X X + λI ) −1 X X(X X + λI ) −1 . In a statistical interpretation, Tikhonov regularisation corresponds to ridge estimation. For more general types