2012 11th International Conference on Machine Learning and Applications 2012
DOI: 10.1109/icmla.2012.192
|View full text |Cite
|
Sign up to set email alerts
|

First Order Statistics Based Feature Selection: A Diverse and Powerful Family of Feature Seleciton Techniques

Abstract: Dimensionality reduction techniques have become a required step when working with bioinformatics datasets. Techniques such as feature selection have been known to not only improve computation time, but to improve the results of experiments by removing the redundant and irrelevant features or genes from consideration in subsequent analysis. Univariate feature selection techniques in particular are well suited for the large levels of high dimensionality that are inherent in bioinformatics datasets (for example: … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
15
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
5
2
1

Relationship

4
4

Authors

Journals

citations
Cited by 33 publications
(15 citation statements)
references
References 14 publications
0
15
0
Order By: Relevance
“…The ratio is defined as the difference between the mean value of that feature for the positive class instances and the mean value of that feature for the negative class instances over the difference between the standard deviation of that feature for the positive class and the standard deviation of that feature for the negative class. The larger the S2N ratio, the more relevant a feature is to the dataset [15].…”
Section: B Feature Selection Techniquesmentioning
confidence: 99%
See 1 more Smart Citation
“…The ratio is defined as the difference between the mean value of that feature for the positive class instances and the mean value of that feature for the negative class instances over the difference between the standard deviation of that feature for the positive class and the standard deviation of that feature for the negative class. The larger the S2N ratio, the more relevant a feature is to the dataset [15].…”
Section: B Feature Selection Techniquesmentioning
confidence: 99%
“…When using feature selection a final feature subset size must be chosen. In our work we use 25 features which, based on previous research, is a reasonable number of features [15].…”
Section: B Feature Selection Techniquesmentioning
confidence: 99%
“…Wrappers, unlike filter approaches, use classifiers when making a decision, and often the classifier used to calculate the score of a particular feature subset is the same one that will be used in the post selection analysis. There are two main disadvantages in the use of wrapper based feature selection techniques: limited utility of chosen features (due to the chosen features being specific to the learner used within the wrapper and thus not necessarily important to the problem itself) and slow computation time (from For more information regarding the FOS techniques refer to Khoshgoftaar et al [11].…”
Section: B Feature Selection Techniquesmentioning
confidence: 99%
“…Once the ranking is complete, the user can choose a subset of the top performing features for use in subsequent analysis. In this work, we use four rankers from three different families of filter-based feature selection methods: two rankers, Information Gain (IG) and ReleifF (RF), are from the "Commonly-Used" family, one ranker, Area Under the ROC Curve (ROC), is from the "ThresholdBased Feature Selection" (TBFS) family, and Signal-to-Noise (S2N) that is a ranking technique from the family of FirstOrder Statistics-based (FOS) [9] . We use rankers because filter-and wrapper-based subset selection techniques can be computationally prohibitive in particular for datasets with high number of features (genes) that are very common in the field of bioinformatics.…”
Section: Feature Selection Techniquesmentioning
confidence: 99%