2022
DOI: 10.1145/3511601
|View full text |Cite
|
Sign up to set email alerts
|

Breaking the Curse of Class Imbalance: Bangla Text Classification

Abstract: This article addresses the class imbalance issue in a low-resource language called Bengali. As a use-case, we choose one of the most fundamental NLP tasks, i.e., text classification, where we utilize three benchmark text corpora: fake-news dataset, sentiment analysis dataset, and song lyrics dataset. Each of them contains a critical class imbalance. We attempt to tackle the problem by applying several strategies that include data augmentation with synthetic samples via text and embedding generation in order to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 64 publications
(43 reference statements)
0
0
0
Order By: Relevance
“…Similarly, different low-resource languages suffer from the problem of class imbalance including the Bengali language. [74] tries to resolve the problem by applying word embedding strategies as one of their methods in the sentiment classification task. Three DL models were used for the experiment and evaluation.…”
Section: B Word-embeddingmentioning
confidence: 99%
See 4 more Smart Citations
“…Similarly, different low-resource languages suffer from the problem of class imbalance including the Bengali language. [74] tries to resolve the problem by applying word embedding strategies as one of their methods in the sentiment classification task. Three DL models were used for the experiment and evaluation.…”
Section: B Word-embeddingmentioning
confidence: 99%
“…This omission is often attributed to the utilization of online open-source data repositories, where the availability of already generated and preprocessed data obviates the need for explicit preprocessing steps. Studies like [51], [67], [62], [74], highlight the critical role of stemming and lemmatization, particularly in low-resource sentiment analysis scenarios. The absence of adequate tools for stemming and lemmatization in certain languages within NLTK necessitates the development of tailored algorithms for these specific languages.…”
Section: Transfer Learningmentioning
confidence: 99%
See 3 more Smart Citations