A selective overview of feature screening for ultrahigh-dimensional data

Liu, Jing-Yuan; Zhong, Wei; Li, Runze

doi:10.1007/s11425-015-5062-9

Cited by 83 publications

(31 citation statements)

References 55 publications

(115 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After Tibshirani (1996) introduced L 1 -penalized regression, the so-called least absolute shrinkage and selection operator (LASSO), statistical modelling with sparsity became an active research topic in the fields of statistics and machine learning (see B€ uhlmann & van de Geer, 2011;Fan & Lv, 2010;Wellner & Zhang, 2012; for reviews). By adding a sparsityinducing penalty (e.g., the L 1 , penalty) in the estimation criterion (e.g., a likelihood function), the resulting penalized (or regularized) estimate can have elements that are exactly zero.…”

Section: Introductionmentioning

confidence: 99%

A penalized likelihood method for multi‐group structural equation modelling

Huang

2018

Brit J Math & Statis

View full text Add to dashboard Cite

In the past two decades, statistical modelling with sparsity has become an active research topic in the fields of statistics and machine learning. Recently, Huang, Chen and Weng (2017, Psychometrika, 82, 329) and Jacobucci, Grimm, and McArdle (2016, Structural Equation Modeling: A Multidisciplinary Journal, 23, 555) both proposed sparse estimation methods for structural equation modelling (SEM). These methods, however, are restricted to performing single-group analysis. The aim of the present work is to establish a penalized likelihood (PL) method for multi-group SEM. Our proposed method decomposes each group model parameter into a common reference component and a group-specific increment component. By penalizing the increment components, the heterogeneity of parameter values across the population can be explored since the null group-specific effects are expected to diminish. We developed an expectation-conditional maximization algorithm to optimize the PL criteria. A numerical experiment and a real data example are presented to demonstrate the potential utility of the proposed method.

show abstract

Section: Introductionmentioning

confidence: 99%

A penalized likelihood method for multi‐group structural equation modelling

Huang

2018

Brit J Math & Statis

View full text Add to dashboard Cite

show abstract

“…He et al (2013) suggest a ranking procedure relying on the marginal quantile utility; Shao and Zhang (2014) introduce a ranking based on the martingale difference correlation. An extensive overview of these and other measures that can be used for variable screening can be found in Liu et al (2015). In this work we also consider variable rankings based on measures which originally have not been developed for this purpose, e.g.…”

Section: Introductionmentioning

confidence: 99%

Ranking-Based Variable Selection for high-dimensional data

Baranowski¹,

Chen²,

Fryźlewicz³

2020

STAT SINICA

View full text Add to dashboard Cite

We propose Ranking-Based Variable Selection (RBVS), a technique aiming to identify important variables influencing the response in high-dimensional data. The RBVS algorithm uses subsampling to identify the set of covariates which non-spuriously appears at the top of a chosen variable ranking. We study the conditions under which such set is unique and show that it can be successfully recovered from the data by our procedure. Unlike many existing high-dimensional variable selection techniques, within all the relevant variables, RBVS distinguishes between the important and unimportant variables, and aims to recover only the important ones. Moreover, RBVS does not require any model restrictions on the relationship between the response and covariates, it is therefore widely applicable, both in a parametric and non-parametric context. We illustrate its good practical performance in a comparative simulation study. The RBVS algorithm is implemented in the publicly available R package rbvs.parsimonious models are often more interpretable. Third, identifying the set of important variables can be the main goal of statistical analysis, which precedes further scientific investigations.Our aim is to identify a subset of {X 1 , . . . , X p } which contributes to Y , under scenarios in which p is potentially much larger than n. To model this phenomenon, we work in a framework in which p diverges with n. Therefore, both p and the distribution of Z depend on n and we work with a triangular array, instead of a sequence. To facilitate interpretability, here for each j, what variable X j represents does not change as p (and n) increases. Our framework includes, for instance, high-dimensional linear and non-linear regression models. Our proposal, termed Ranking-Based Variable Selection (RBVS), can in general be applied to any technique which allows the ranking of covariates according to their impact on the response. Therefore, we do not impose any particular model structure on the relationship between , p, a chosen measure used to assess the importance of covariates (either joint or marginal) may require some assumptions on the model. The main ingredient of the RBVS methodology is a variable ranking defined as follows. Definition 1.1. The variable ranking R n = (R n1 , . . . , R np ) based onω 1 , . . . ,ω p is a permutation of {1, . . . , p} satisfyingω R n1 > . . . >ω Rnp . Potential ties are broken at random uniformly. A large number of measures can be used to construct variable rankings. In the linear model, the marginal correlation coefficient serves as an example of such a measure. It is the main component of Sure Independence Screening (SIS, Fan and Lv (2008)). Hall and Miller (2009a) consider the generalized correlation coefficient, which can capture (possibly) non-linear dependence between Y and X j 's. Along the same lines, Fan et al. (2011) propose a procedure based on the magnitude of spline approximations of Y over each X j , aiming to capture dependencies in non-parametric additive models. Fan and Song (2010) extend SIS ...

show abstract

“…[30] and [17]). [21] is an excellent review paper of feature screening procedures. The adaptive Lasso and the group Lasso are important variants of the Lasso.…”

Section: Introductionmentioning

confidence: 99%

The de-biased group Lasso estimation for varying coefficient models

Honda

2019

Ann Inst Stat Math

View full text Add to dashboard Cite

There has been a lot of attention on the de-biased or de-sparsified Lasso since it was proposed in 2014. The Lasso is very useful in variable selection and obtaining initial estimators for other methods in high-dimensional settings. However, it is well-known that the Lasso produces biased estimators. Therefore several authors simultaneously proposed the de-biased Lasso to fix this drawback and carry out statistical inferences based on the de-biased Lasso estimators. The de-biased Lasso procedures need desirable estimators of high-dimensional precision matrices for bias correction. Thus the research is almost limited to linear regression models with some restrictive assumptions, generalized linear models with stringent assumptions and the like. To our knowledge, there are a few papers on linear regression models with group structure, but no result on structured nonparametric regression models such as varying coefficient models. In this paper, we apply the de-biased group Lasso to varying coefficient models and closely examine the theoretical properties and the effects of approximation errors involved in nonparametric regression. Some simulation results are also presented.

show abstract

A selective overview of feature screening for ultrahigh-dimensional data

Cited by 83 publications

References 55 publications

A penalized likelihood method for multi‐group structural equation modelling

A penalized likelihood method for multi‐group structural equation modelling

Ranking-Based Variable Selection for high-dimensional data

The de-biased group Lasso estimation for varying coefficient models

Contact Info

Product

Resources

About