Fifth International Conference on Hybrid Intelligent Systems (HIS'05) 2005
DOI: 10.1109/ichis.2005.23
|View full text |Cite
|
Sign up to set email alerts
|

An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics

Abstract: Learning from imbalanced data occurs very frequently in functional genomic applications. One positive example to thousands of negative instances is common in scientific applications. Unfortunately, traditional machine learning treats the extremely small instances as noise. The standard approach for this difficulty is balancing training data by resampling them. However, this results in high false positive predictions. Hence, we propose preprocessing majority instances by partitioning them into clusters. This gr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2009
2009
2024
2024

Publication Types

Select...
6
1
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(5 citation statements)
references
References 10 publications
(10 reference statements)
0
5
0
Order By: Relevance
“…Class purity maximization [47] (CPM) selects two instances (one majority and one minority) as centers. By forming clusters, a classifier committee decides which instances are removed.…”
Section: Undersampling Methodsmentioning
confidence: 99%
“…Class purity maximization [47] (CPM) selects two instances (one majority and one minority) as centers. By forming clusters, a classifier committee decides which instances are removed.…”
Section: Undersampling Methodsmentioning
confidence: 99%
“…Precision counts the true positives, how many examples are properly classified within the same cluster [48].…”
Section: Precisionmentioning
confidence: 99%
“…Öztornaci et al [273] found that multiple machine learning models (SVM, MLP, Random Forest) benefit from the Synthetic Minority Oversampling Technique (SMOTE) in finding single nucleotide polymorphisms (SNPs). This data-imbalance issue has also been encountered in machine learning methods [274,275], while ensemble methods appear to be powerful [269]. Sun et al [276] applied the undersampling method together with a majority vote to address the imbalanced data distribution inherent in gene expression image annotation tasks.…”
Section: Class-imbalanced Datamentioning
confidence: 99%