2014
DOI: 10.1080/07350015.2013.863158
|View full text |Cite
|
Sign up to set email alerts
|

Feature Screening for Ultrahigh Dimensional Categorical Data With Applications

Abstract: Ultrahigh dimensional data with both categorical responses and categorical covariates are frequently encountered in the analysis of big data, for which feature screening has become an indispensable statistical tool. We propose a Pearson chi-square based feature screening procedure for categorical response with ultrahigh dimensional categorical covariates. The proposed procedure can be directly applied for detection of important interaction effects. We further show that the proposed procedure possesses screenin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
57
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 64 publications
(58 citation statements)
references
References 20 publications
1
57
0
Order By: Relevance
“…() and for categorical variables in, for example, Huang et al. () and Ni and Fang (). For continuous data, the method in Li et al.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…() and for categorical variables in, for example, Huang et al. () and Ni and Fang (). For continuous data, the method in Li et al.…”
Section: Methodsmentioning
confidence: 99%
“…One is the sufficient dimension reduction method (Li, 1991;Cook, 1994). The other way is covariate selection or feature screening, which has been developed for continuous variables in, for example, Li et al (2012) and for categorical variables in, for example, Huang et al (2014) and Ni and Fang (2016). For continuous data, the method in Li et al (2012) is based on the following marginal correlation measurement between Y (j) and X (ν) , the νth component of X,…”
Section: Covariate Screeningmentioning
confidence: 99%
“…When both response and feature variables are categorical, it is not difficult to use a test of independence statistic as marginal utility for feature screening. Huang et al [33] employed the Pearson χ 2 -test statistic for independence as a marginal utility for feature screening. They further established the sure screening procedure of their screening procedure under mild conditions.…”
Section: Model-free Feature Screeningmentioning
confidence: 99%
“…Both computational speed and classification accuracy are also expected to be taken into account. For categorical features, statistical test (e.g., Chi-square test) [ 8 , 9 ], information theory (e.g., information gain, mutual information, cross entropy) [ 10 , 11 , 12 , 13 ], and Bayesian methods [ 14 , 15 ] are usually used for feature screening, especially in the field of text classification. In this study, we propose a novel model-free feature screening method called weighted mean squared deviation (WMSD), which can be considered as a simplified version of Chi-square statistic and mutual information.…”
Section: Introductionmentioning
confidence: 99%