Multi-class sentiment analysis of urdu text using multilingual BERT

Khan, Lal; Amjad, Ammar; Ashraf, Noman; Chang, Hsien-Tsung

doi:10.1038/s41598-022-09381-9

Cited by 53 publications

(44 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Model A, is a rule based machine learning model for Urdu sentiment analysis using support vector machine, Naïve Bayesian, Adabbost, MLP, LR and RF along with deep learning model using CNN-1D, LSTM, Bi-LSTM, GRU and Bi-GRU techniques [9]. Model B: is the representation of Umair et al, [10], that classically execute different machine learning based classification models such as support vector machine and naïve Bayesian algorithm on Roman Urdu text to test its accuracy.…”

Section: Model A: Khan Et Al 2021 Which Is Abbreviated Asmentioning

confidence: 99%

See 1 more Smart Citation

A machine learning approach for Urdu text sentiment analysis

Akhtar

Rehman

2023

Mehran Univ. res. j. eng. technol.

View full text Add to dashboard Cite

Product evaluations, ratings, and other sorts of online expressions have risen in popularity as a result of the emergence of social networking sites and blogs. Sentiment analysis has emerged as a new area of study for computational linguists as a result of this rapidly expanding data set. From around a decade ago, this has been a topic of discussion for English speakers. However, the scientific community completely ignores other important languages, such as Urdu. Morphologically, Urdu is one of the most complex languages in the world. For this reason, a variety of unique characteristics, such as the language's unusual morphology and unrestricted word order, make the Urdu language processing a difficult challenge to solve. This research provides a new framework for the categorization of Urdu language sentiments. The main contributions of the research are to show how important this multidimensional research problem is as well as its technical parts, such as the parsing algorithm, corpus, lexicon, etc. A new approach for Urdu text sentiment analysis including data gathering, pre-processing, feature extraction, feature vector formation, and finally, sentiment classification has been designed to deal with Urdu language sentiments. The result and discussion section provides a comprehensive comparison of the proposed work with the standard baseline method in terms of precision, recall, f-measure, and accuracy of three different types of datasets. In the overall comparison of the models, the proposed work shows an encouraging achievement in terms of accuracy and other metrics. Last but not least, this section also provides the featured trend and possible direction of the current work.

show abstract

Section: Model A: Khan Et Al 2021 Which Is Abbreviated Asmentioning

confidence: 99%

“…Further, languages such as French, English, Spanish, and other European languages must be addressed in terms of tool accessibility. Despite this, languages like Punjabi, Urdu and Hindi are seen as lacking in [5][6][7][8][9][10].…”

Section: Introductionmentioning

confidence: 99%

A machine learning approach for Urdu text sentiment analysis

Akhtar

Rehman

2023

Mehran Univ. res. j. eng. technol.

View full text Add to dashboard Cite

show abstract

“…Khan et al [19] perform SA on Urdu language using a dataset comprising multiple domains including beverages, movies sports, politics, etc. Rule-based, ML, and DL approaches are used for the classification of the text.…”

Section: Deep Learning Approachmentioning

confidence: 99%

Deep Learning Based Cross Domain Sentiment Classification for Urdu Language

Altaf

Jamal

et al. 2022

IEEE Access

View full text Add to dashboard Cite

Sentiment analysis is a widely researched area due to its various applications in customer services, brand monitoring, and market research. Automatic sentiment classification is an important but challenging task. Contrary to the English language, sentiment analysis for low-resource languages like Urdu is an under-explored research area. Most of the work on sentiment analysis in the Urdu language is domain-dependent where models are mostly trained and tested on the same dataset on limited domains. However, sentiments in different domains are expressed differently, and manually annotating the datasets for all possible domains is unfeasible. Training a sentiment classifier using annotated data on one domain and testing it on another domain results in poor performance as the terms appearing in the source domain (training data) might not appear in the target (testing data) domain. In this paper, we present a baseline method for cross-domain sentiment analysis in the Urdu language using two different domains. Feature extraction is performed using n-grams and word embedding techniques. Sentiment classification is performed using machine learning and deep learning classifiers. The proposed method achieves an accuracy, precision, recall, and F1 scores of 0.77, 0.83, 0.68, and 0.75, respectively.INDEX TERMS Cross-domain sentiment analysis; deep learning; Urdu language processing; feature engineering

show abstract

“…Users of SN connect, share their thoughts, feelings, and ideas, and participate in discussion groups. Text conversation, or more specifically, emotion classification (EC), is essential to comprehending people's activities since the internet's invisible nature has made it possible for a single user to engage in violent SN speech data [20].…”

Section: Introductionmentioning

confidence: 99%

Transformer-based Text Classification on Unified Bangla Multi-class Emotion Corpus

Sourav

Wang

Mahmud

et al. 2023

Preprint

View full text Add to dashboard Cite

Due to its importance in studying people’s thoughts on various Web 2.0 services, emotion classification is a critical undertaking. Most existing research is focused on the English language , with little work on low-resource languages, e.g., Bangla. In recent years, sentiment analysis, particularly emotion classification in English, has received increasing attention, but little study has been done in the context of Bangla (one of the world’s most widely spoken languages). In this research, we propose a complete set of approaches for identifying and extracting emotions from Bangla texts. We provide a Bangla emotion classification for six classes, i.e., anger, disgust, fear, joy, sadness, and surprise, from Bangla words using transformer-based models, which exhibit phenomenal results in recent days, especially for high-resource languages. The Unified Bangla Multi-class Emotion Corpus (UBMEC) is used to assess the performance of our models. UBMEC is created by combining two previously released manually labelled datasets of Bangla comments on six emotion classes with fresh manually labelled Bangla comments created by us. The corpus dataset and code we used in this work are publicly available.

show abstract

Multi-class sentiment analysis of urdu text using multilingual BERT

Cited by 53 publications

References 52 publications

A machine learning approach for Urdu text sentiment analysis

A machine learning approach for Urdu text sentiment analysis

Deep Learning Based Cross Domain Sentiment Classification for Urdu Language

Transformer-based Text Classification on Unified Bangla Multi-class Emotion Corpus

Contact Info

Product

Resources

About