An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics

Yoon, Ki-Hong; Kwek, Stephen

doi:10.1109/ichis.2005.23

Cited by 16 publications

(5 citation statements)

References 10 publications

(10 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Class purity maximization [47] (CPM) selects two instances (one majority and one minority) as centers. By forming clusters, a classifier committee decides which instances are removed.…”

Section: Undersampling Methodsmentioning

confidence: 99%

Customized Instance Random Undersampling to Increase Knowledge Management for Multiclass Imbalanced Data Classification

et al. 2022

View full text Add to dashboard Cite

Imbalanced data constitutes a challenge for knowledge management. This problem is even more complex in the presence of hybrid (numeric and categorical data) having missing values and multiple decision classes. Unfortunately, health-related information is often multiclass, hybrid, and imbalanced. This paper introduces a novel undersampling procedure that deals with multiclass hybrid data. We explore its impact on the performance of the recently proposed customized naïve associative classifier (CNAC). The experiments made, and the statistical analysis, show that the proposed method surpasses existing classifiers, with the advantage of being able to deal with multiclass, hybrid, and incomplete data with a low computational cost. In addition, our experiments showed that the CNAC benefits from data sampling; therefore, we recommend using the proposed undersampling procedure to balance data for CNAC.

show abstract

“…Class purity maximization [47] (CPM) selects two instances (one majority and one minority) as centers. By forming clusters, a classifier committee decides which instances are removed.…”

Section: Undersampling Methodsmentioning

confidence: 99%

Customized Instance Random Undersampling to Increase Knowledge Management for Multiclass Imbalanced Data Classification

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Precision counts the true positives, how many examples are properly classified within the same cluster [48].…”

Section: Precisionmentioning

confidence: 99%

Unsupervised Learning - A Systematic Literature Review

Dridi

2024

Preprint

View full text Add to dashboard Cite

Machine learning (ML) is a data-driven approach in which machines learn from the data without the involvement ofhumans. Several domains take advantage of mind-boggling applications of ML. There are three main learning problems inML: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning involves the trainingof the model on a labelled dataset. Unsupervised learning involves the training of a model in an unlabeled dataset. Themodel learns on its own by learning the features of the training dataset. Based on that learning features, the model makespredictions on test data. Several unsupervised learning approaches and algorithms range from clustering, k-means toagglomerative, Principal component analysis, and Fuzzy C-means. Clustering involves the grouping of objects based ontheir similar features. The algorithms in clustering are categorized into two broad categories such as hierarchal clusteringand partitional clustering.

show abstract

“…Öztornaci et al [273] found that multiple machine learning models (SVM, MLP, Random Forest) benefit from the Synthetic Minority Oversampling Technique (SMOTE) in finding single nucleotide polymorphisms (SNPs). This data-imbalance issue has also been encountered in machine learning methods [274,275], while ensemble methods appear to be powerful [269]. Sun et al [276] applied the undersampling method together with a majority vote to address the imbalanced data distribution inherent in gene expression image annotation tasks.…”

Section: Class-imbalanced Datamentioning

confidence: 99%

Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models

Yue,

Wang,

Zhang

et al. 2023

IJMS

View full text Add to dashboard Cite

The data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in various fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning, since we expect a superhuman intelligence that explores beyond our knowledge to interpret the genome from deep learning. A powerful deep learning model should rely on the insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with proper deep learning-based architecture, and we remark on practical considerations of developing deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research and point out current challenges and potential research directions for future genomics applications. We believe the collaborative use of ever-growing diverse data and the fast iteration of deep learning models will continue to contribute to the future of genomics.

show abstract

An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics

Cited by 16 publications

References 10 publications

Customized Instance Random Undersampling to Increase Knowledge Management for Multiclass Imbalanced Data Classification

Customized Instance Random Undersampling to Increase Knowledge Management for Multiclass Imbalanced Data Classification

Unsupervised Learning - A Systematic Literature Review

Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models

Contact Info

Product

Resources

About