Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.
DOI: 10.1109/cvpr.2004.1315171
|View full text |Cite
|
Sign up to set email alerts
|

Feature selection for classifying high-dimensional numerical data

Abstract: Classifying high-dimensional numerical data is a very challenging problem. In high dimensional feature spaces, the performance of supervised learning methods suffer from the curse of dimensionality, which degrades both classification accuracy and efficiency. To address this issue, we present an efficient feature selection method to facilitate classifying high-dimensional numerical data. Our method employs balanced information gain to measure the contribution of each feature (for data classification); and it ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(5 citation statements)
references
References 13 publications
0
5
0
Order By: Relevance
“…Overall, our proposed strategy allowed us to overcome issues of redundancy and irrelevancy of information that are commonly faced when handling high dimensional data which would have led to reduced efficiency and accuracy of ML models trained. 61 Our strategy and the incorporation of non-genetic factors identified 95 coding haplotypes and 5 nongenetic factors for training. Notably, the identified features displayed good predictive performance classifying the MTX response of 70 patients whose data had been placed aside as the unseen test dataset.…”
Section: Discussionmentioning
confidence: 99%
“…Overall, our proposed strategy allowed us to overcome issues of redundancy and irrelevancy of information that are commonly faced when handling high dimensional data which would have led to reduced efficiency and accuracy of ML models trained. 61 Our strategy and the incorporation of non-genetic factors identified 95 coding haplotypes and 5 nongenetic factors for training. Notably, the identified features displayed good predictive performance classifying the MTX response of 70 patients whose data had been placed aside as the unseen test dataset.…”
Section: Discussionmentioning
confidence: 99%
“…Symmetric uncertainty was developed by William et al, (1996), and an entropy-based filter that assess pair-wise similarity between dependent and independent attribute irrespective of their probability distribution and interdependency (Wu and Zhang, 2004). It measures the information gain of the response random variable relative to the predictor and the lesser the entropy the greater the association of the data.…”
Section: Symmetric Uncertaintymentioning
confidence: 99%
“…In GCM selection for hydro-climatic study, the information entropy based filter referred to as symmetrical uncertainty (SU) (Witten et al, 2005) has gained the attention of researchers due to its ability to select variable without bias and reliably. The technique was used to rank GCMs according to their degree of similarity or otherwise with the observations for the entire time series data and it has the advantage of giving a universal metric for the relationship between dependent and independent features irrespective of the shape of the underlying distributions (Wu and Zhang 2004) and has been used in various studies (Ahmed et al, 2019b;Nashwan and Shahid, 2019;Pour et al, 2018;Salman et al, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Since the high dimensionality of the data has the potential to reduce the accuracy and e ciency of the model [25], researchers frequently perform data dimensionality reduction of the variables through variable screening to select the subset of features from the original feature set that have the greatest impact on the target variable AECOPD, with the goals of improving the accuracy and interpretability of the model as well as reducing the computational cost and the noise from irrelevant features. Considering the multiple feature screening methods available, each with its own set of bene ts and drawbacks, the fact that it is challenging to adequately characterize all the correlations between variables in a single feature screening method, and that each method measures relevance differently.…”
Section: Feature Selectionmentioning
confidence: 99%