2011
DOI: 10.1007/s11222-011-9279-3
|View full text |Cite
|
Sign up to set email alerts
|

The predictive Lasso

Abstract: We propose a shrinkage procedure for simultaneous variable selection and estimation in generalized linear models (GLMs) with an explicit predictive motivation. The procedure estimates the coefficients by minimizing the Kullback-Leibler divergence of a set of predictive distributions to the corresponding predictive distributions for the full model, subject to an l1 constraint on the coefficient vector. This results in selection of a parsimonious model with similar predictive performance to the full model. Thank… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
24
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 13 publications
(25 citation statements)
references
References 20 publications
0
24
0
Order By: Relevance
“…By predictive criterion functions, we are referring to functions of predicted/imputed values of the response. Our general approach is to solve the following optimization problem [21]: minαi=1nDifalse(Mfull,Mαfalse)+λj=1pfalse|αjfalse|, where λ > 0 is a tuning parameter and D i ( M full , M α ) refers to the Kullback-Leibler distance between the predictive distribution of a “full” model relative to those of a model parametrized by the vector α for subject i . When we talk about M here, we are actually alluding to models for the joint distribution of the potential outcomes.…”
Section: Preliminariesmentioning
confidence: 99%
See 4 more Smart Citations
“…By predictive criterion functions, we are referring to functions of predicted/imputed values of the response. Our general approach is to solve the following optimization problem [21]: minαi=1nDifalse(Mfull,Mαfalse)+λj=1pfalse|αjfalse|, where λ > 0 is a tuning parameter and D i ( M full , M α ) refers to the Kullback-Leibler distance between the predictive distribution of a “full” model relative to those of a model parametrized by the vector α for subject i . When we talk about M here, we are actually alluding to models for the joint distribution of the potential outcomes.…”
Section: Preliminariesmentioning
confidence: 99%
“…The Kullback-Leibler distance is generically defined as Dfalse(f,gfalse)=true∫logtrue[gfalse(xfalse)ffalse(xfalse)true]ffalse(xfalse)dx, where f and g represent densities of the data. While the optimization problem in (7) is written down in a general form, what Tran et al [21] show is that in a linear model case, it corresponds to solving a weighted LASSO problem of the form minimizing i=1nfalse(β^0+β^1Ti+γ^Xiβ0β1TiγXifalse)2normals.normalt.false|β0false|+false|β1false|+j=1pfalse|γjfalse|t, where (β̂ 0 , β̂ 1 ,γ̂) are the least squares estimators for the regression coefficients in the usual linear model for the observed data. Implicitly, we are assuming in (8) that the objective function is being evaluated at future design points which are identical to the observed data points.…”
Section: Preliminariesmentioning
confidence: 99%
See 3 more Smart Citations