Breaking the Curse of Class Imbalance: Bangla Text Classification

Rafi-Ur-Rashid, Md.; Mahbub, M.; Adnan, Muhammad Abdullah

doi:10.1145/3511601

Cited by 4 publications

(6 citation statements)

References 64 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, different low-resource languages suffer from the problem of class imbalance including the Bengali language. [74] tries to resolve the problem by applying word embedding strategies as one of their methods in the sentiment classification task. Three DL models were used for the experiment and evaluation.…”

Section: B Word-embeddingmentioning

confidence: 99%

“…This omission is often attributed to the utilization of online open-source data repositories, where the availability of already generated and preprocessed data obviates the need for explicit preprocessing steps. Studies like [51], [67], [62], [74], highlight the critical role of stemming and lemmatization, particularly in low-resource sentiment analysis scenarios. The absence of adequate tools for stemming and lemmatization in certain languages within NLTK necessitates the development of tailored algorithms for these specific languages.…”

Section: Transfer Learningmentioning

confidence: 99%

“…Nonetheless, challenges persist, particularly in accurately representing all languages within tokenizer frameworks due to variations in language characters and other linguistic factors. Data cleaning [95], [35], [96], [49], [62], [65], [58], [68], [67], [89], [74], [75], [76], [77], [88], [94], [100], [103], [104], [107], [80], [106], [81], [108], [109], [82], [53], [110], [111], [83], [92], [113], [114], [115], [116], [117] Stemming [51], [67], [104], [108],…”

Section: Transfer Learningmentioning

confidence: 99%

“…Lemmatization [62], [74], [104], [108] Part-of-speech (POS) [95], [67], [76], [108] Tokenization [47], [48], [51], [67], [74], [94], [100], [104],…”

Section: Transfer Learningmentioning

confidence: 99%

“…Additionally, one-hot encoding is mentioned in a few like [35] and [74], suggesting its presence in certain studies despite it various limitations. Embedding techniques, including both traditional embeddings and contextual embeddings, emerge as popular choices, with numerous studies highlighting their effectiveness.…”

Section: Transfer Learningmentioning

confidence: 99%

See 4 more Smart Citations

Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data Sources

Aliyu,

Sarlan,

Usman Danyaro

et al. 2024

IEEE Access

View full text Add to dashboard Cite

The field of low-resource sentiment analysis has seen significant developments in recent years. This research review SLR evaluates the approaches and data sources utilized in low-resource sentiment analysis by deep learning. The primary aim is to discover suitable approaches for future sentiment analysis in low-resource. Our studies explore various languages, models, and data sources expressing a desire to create effective approaches. Our emphasis lies in the critical evaluation of the approaches and the datasets utilized, to identify areas where further research is needed. Our analysis study adds to the existing body of literature reviews, encompassing multilingual low-resource sentiment analysis research spanning from 2018 to 2023. The findings indicate that the transfer learning approach is the most frequently used, followed by word embedding learning and machine translation systems. Additionally, the study shows that social media is the most used platform for data collection, followed by product reviews, movies, and hotels. There has been a significant surge in the adoption of pre-trained transformers, indicating a growing interest in exploring the potential of these models for low-resource languages within the natural language processing (NLP) community. This trend is largely attributed to the novel nature of these models and their feature of being nonlabour intensive. However, the scarcity of annotated datasets for such languages remains a major hurdle. finally, these research findings are relevant and informative for any researcher working in the field of lowresource multilingual sentiment analysis. The study introduces a conceptual framework for performing sentiment analysis in low-resource. The study provides a valuable resource for future researchers.

show abstract

Section: B Word-embeddingmentioning

confidence: 99%

Section: Transfer Learningmentioning

confidence: 99%

Section: Transfer Learningmentioning

confidence: 99%

“…Lemmatization [62], [74], [104], [108] Part-of-speech (POS) [95], [67], [76], [108] Tokenization [47], [48], [51], [67], [74], [94], [100], [104],…”

Section: Transfer Learningmentioning

confidence: 99%

Section: Transfer Learningmentioning

confidence: 99%

See 3 more Smart Citations

Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data Sources

Aliyu,

Sarlan,

Usman Danyaro

et al. 2024

IEEE Access

View full text Add to dashboard Cite

show abstract

Class overlap handling methods in imbalanced domain: A comprehensive survey

Kumar,

Singh,

Shankar Yadav

2024

Multimed Tools Appl

View full text Add to dashboard Cite

A Comprehensive Roadmap on Bangla Text-based Sentiment Analysis

Shammi

Das

Chakraborty

et al. 2023

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

View full text Add to dashboard Cite

The effortless expansion of Internet access has eventually transformed the dissemination behavior towards E-Mode. Thus the usage of online or, more specifically, ‘Digital’ texts has expanded abruptly. ‘Bangla’, the seventh most spoken language globally, has no different nature. Communication in the Bangla language has also been exposed on the Internet, which describes the feelings of individuals in any specific context. These enormously generated data from diverse sources have drawn the interest of the researchers working in the Natural Language Processing domain. Despite its relatively complicated structure, a lesser amount of annotated data, as well as a limited number of frameworks and approaches, exist. This lacking of resources has kept several stones unturned in this diverse, emotion-rich and widely spoken language. To bridge the lacking and absence of resources, this article aims to provide a generalized deduced working procedure in this domain. To do so, the existing research work in the domain of sentiment analysis using Bangla text has been collected, evaluated and summarized. Also, in this article, the techniques used in pre-processing, feature extraction, and eventually used algorithms have been identified and discussed. Considering these facts, this research work sketches a tentative blueprint of sentiment analysis using Bangla text. Additionally, this article discusses existing regional language corpora such as Tamil, Urdu, and Hindi, as well as English and methodologies used to extract emotional essence from Bangla language comparing other languages. That will assist in determining the probable chosen path of exploring Bangla in a more deeper aspect. Moreover, this work has deduced and presented a generalized framework that will direct aspiring researchers to decide the pathway of choosing data vis-à-vis methodologies based on their interests.

show abstract

Breaking the Curse of Class Imbalance: Bangla Text Classification

Cited by 4 publications

References 64 publications

Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data Sources

Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data Sources

Class overlap handling methods in imbalanced domain: A comprehensive survey

A Comprehensive Roadmap on Bangla Text-based Sentiment Analysis

Contact Info

Product

Resources

About