Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014) 2014
DOI: 10.1109/iri.2014.7051906
|View full text |Cite
|
Sign up to set email alerts
|

Classification performance of three approaches for combining data sampling and gene selection on bioinformatics data

Abstract: Bioinformatics datasets pose two major challenges to researchers and data-mining practitioners: class imbalance and high dimensionality. Class imbalance occurs when instances of one class vastly outnumber instances of the other class(es), and high dimensionality occurs when a dataset has many independent features (genes). Data sampling is often used to tackle the problem of class imbalance, and the problem of excessive features in the dataset may be alleviated through feature selection. In this work, we examin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 15 publications
0
3
0
Order By: Relevance
“…The findings of this work are partially consistent with the results recently discussed in [21], where the effectiveness of combining RUS and feature selection is evaluated in conjunction with different classifiers and feature selection methods, but within less severe imbalance settings (min_pct > 10%). The beneficial impact of sampling-based approaches on high-dimensional bioinformatics datasets is also explored in [30]- [32]. In particular, [30] relies on both RUS and feature selection, and investigates the extent to which the order of these pre-processing operations impacts on the classification results.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The findings of this work are partially consistent with the results recently discussed in [21], where the effectiveness of combining RUS and feature selection is evaluated in conjunction with different classifiers and feature selection methods, but within less severe imbalance settings (min_pct > 10%). The beneficial impact of sampling-based approaches on high-dimensional bioinformatics datasets is also explored in [30]- [32]. In particular, [30] relies on both RUS and feature selection, and investigates the extent to which the order of these pre-processing operations impacts on the classification results.…”
Section: Discussionmentioning
confidence: 99%
“…The beneficial impact of sampling-based approaches on high-dimensional bioinformatics datasets is also explored in [30]- [32]. In particular, [30] relies on both RUS and feature selection, and investigates the extent to which the order of these pre-processing operations impacts on the classification results. As well, [31] exploits both RUS and feature selection and shows that using fully balanced data significantly improves the SVM performance in protein function prediction tasks.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation