Regularized regression for categorical data

Tutz, Gerhard; Gertheiss, Jan

doi:10.1177/1471082x16642560

Cited by 60 publications

(54 citation statements)

References 116 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other hand, regression obtained the best results for the training data set, which suggests that it may be overfitting the training data. As pointed out by an anonymous referee, a penalty item is commonly inserted into the error function as a regularization method when fitting a regression model to data, in order to prevent overfitting [40][41][42]. We do not added such a penalty in our analysis so that a fairer comparison between the methods can be made given that regularization methods have not been proposed yet in the empirical similarity methodology.…”

Section: Discussionmentioning

confidence: 99%

Prediction by Empirical Similarity via Categorical Regressors

Sanchez

Rêgo

Ospina

2019

MAKE

View full text Add to dashboard Cite

A quantifier of similarity is generally a type of score that assigns a numerical value to a pair of sequences based on their proximity. Similarity measures play an important role in prediction problems with many applications, such as statistical learning, data mining, biostatistics, finance and others. Based on observed data, where a response variable of interest is assumed to be associated with some regressors, it is possible to make response predictions using a weighted average of observed response variables, where the weights depend on the similarity of the regressors. In this work, we propose a parametric regression model for continuous response based on empirical similarities for the case where the regressors are represented by categories. We apply the proposed method to predict tooth length growth in guinea pigs based on Vitamin C supplements considering three different dosage levels and two delivery methods. The inferential procedure is performed through maximum likelihood and least squares estimation under two types of similarity functions and two distance metrics. The empirical results show that the method yields accurate models with low dimension facilitating the parameters’ interpretation.

show abstract

Section: Discussionmentioning

confidence: 99%

Prediction by Empirical Similarity via Categorical Regressors

Sanchez

Rêgo

Ospina

2019

MAKE

View full text Add to dashboard Cite

show abstract

“…In addition, given the presence of a potentially large number of categorical regressors in the context of road safety studies, it would be beneficial to investigate possible approaches to variable selection. A promising avenue could be the incorporation in our framework of further regularizations as explained, for example, by Tutz and Gertheiss (2016).…”

Section: Discussionmentioning

confidence: 99%

“…Notwithstanding those results, Q λ could be potentially extended also to incorporate further regularizations. Recently Tutz and Gertheiss (2016) reviewed the use of alternative penalizations (e.g. the fused and group lasso) to perform enhanced estimation and variable selection in models with categorical responses and predictors.…”

Section: Penalized Generalized Linear Model Representationmentioning

confidence: 99%

Simultaneous Equation Penalized Likelihood Estimation of Vehicle Accident Injury Severity

Donat

Marra

2018

Journal of the Royal Statistical Society Series C: Applied Statistics

View full text Add to dashboard Cite

Summary A bivariate system of equations is developed to model ordinal polychotomous dependent variables within a simultaneous additive regression framework. The functional form of the covariate effects is assumed fairly flexible with appropriate smoothers used to account for non‐linearities and spatial variability in the data. Non‐Gaussian error dependence structures are dealt with by means of copulas whose association parameter is also specified in terms of a generic additive predictor. The framework is employed to study the effects of several risk factors on the levels of injury sustained by individuals in two‐vehicle accidents in France. The use of the methodology proposed is motivated by the presence of common unobservables that may affect the interrelationships between the parties involved in the same crash and by the possible heterogeneity in individuals’ characteristics and accident dynamics. Better calibrated estimates are obtained and misspecification reduced via an enhanced model specification.

show abstract

“…Specification of a Bayesian linear regression model requires not only a model for the data, for example, the linear regression model (3) of Tutz and Gertheiss (2016), y =˛+ p j=1 X jˇj + ε, ε ∼ N(0, 2 I),…”

Section: Bayesian Regularization Of Effects Of Categorical Covariatesmentioning

confidence: 99%

Discussion: Bayesian regularization and effect smoothing for categorical predictors

Wagner

Pauger

2016

Statistical Modelling

View full text Add to dashboard Cite

IntroductionWe would first like to thank Gerhard Tutz and Jan Gertheiss for their profound review of regularization methods for categorical variables, either as covariates or as response variables in regression models. Categorical variables are rather the rule than the exception in regression analyses, particularly in the medical, social and economic sciences. It is amazing that it took quite a long time from ridge regression (Hoerl and Kennard, 1970) to versions of regularization methods, which take into account the specific structure of categorical covariates. As Tutz and Gertheiss predominantly discuss penalized estimation, we will focus on the Bayesian perspective to regularization and effect fusion for categorical covariates.Regularization and sparsity in regression type models have been addressed from a Bayesian point of view in the literature, see for example, Fahrmeir et al. (2010) for an overview on Bayesian regularization and George and McCulloch (1995), Ishwaran and Rao (2005) and O'Hara and Sillanpäa (2009) for approaches to Bayesian variable selection. However, the specific issues arising for categorical covariates have not yet received much attention, with the exception of Chipman (1996), who considers a Bayesian approach for groupwise selection of level effects of a categorical covariate (the Bayesian pendant to Section 3.2.1 of Tutz and Gertheiss, 2016) and the recently proposed Bayesian pendant to smoothing predictors and effect fusion based on spike and slab priors in Pauger and Wagner (2016).Here, we will discuss the relation between regularization penalties and prior distributions and then describe a Bayesian approach for smoothing of level effects of categorical covariates. Finally, we will briefly describe spike and slab prior distributions which encourage a sparse representation of their effects.

show abstract

Regularized regression for categorical data

Cited by 60 publications

References 116 publications

Prediction by Empirical Similarity via Categorical Regressors

Prediction by Empirical Similarity via Categorical Regressors

Simultaneous Equation Penalized Likelihood Estimation of Vehicle Accident Injury Severity

Discussion: Bayesian regularization and effect smoothing for categorical predictors

Contact Info

Product

Resources

About