Md. Rafi-Ur-Rashid scite author profile

Md. Rafi-Ur-Rashid

1Publication

7Citation Statements Received

43Citation Statements Given

How they've been cited

How they cite others

Affiliations

Bangladesh University, United International University

Publications

Order By: Most citations

Breaking the Curse of Class Imbalance: Bangla Text Classification

Rafi-Ur-Rashid

Mahbub

Adnan

2022

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

View full text Add to dashboard Cite

This article addresses the class imbalance issue in a low-resource language called Bengali. As a use-case, we choose one of the most fundamental NLP tasks, i.e., text classification, where we utilize three benchmark text corpora: fake-news dataset, sentiment analysis dataset, and song lyrics dataset. Each of them contains a critical class imbalance. We attempt to tackle the problem by applying several strategies that include data augmentation with synthetic samples via text and embedding generation in order to augment the proportion of the minority samples. Moreover, we apply ensembling of deep learning models by subsetting the majority samples. Additionally, we enforce the focal loss function for class-imbalanced data classification. We also apply the outlier detection technique, data resampling, and hidden feature extraction to improve the minority-f1 score. All of our experimentations are entirely focused on textual content analysis, which results in a more than 90% minority f1 score for each of the three tasks. It is an excellent outcome on such highly class-imbalanced datasets.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Md. Rafi-Ur-Rashid

Breaking the Curse of Class Imbalance: Bangla Text Classification

Contact Info

Product

Resources

About