Sentiment Analysis for Roman Urdu

Rafique, Ayesha; Malik, Kamran; Nawaz, Zahid; Bukhari, Faisal; Jalbani, Akhtar Hussain

doi:10.22581/muet1982.1902.20

Cited by 22 publications

(16 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Starting from simple features then moved to their eight features set. It was concluded that these set of eight features have been used in text sentiment analysis for English language (13) .…”

Section: Feature Extractionmentioning

confidence: 99%

“…min_df : It is the minimum numbers of documents a word must be present in to be kept. Nor m: It is set to l2; to ensure all our feature vectors have a euclidian norm of 1. ngram_range: It is set to (1,2) to indicate that we want to consider both unigrams and bigrams. stop_words: It is set to "preprocessing variable (Which holds all the necessary stopwords for Urdu and Roman Urdu Language)" to remove all common pronouns ("a", "the", etc) to reduce the number of noisy features.…”

Section: Feature Extractionmentioning

confidence: 99%

“…For example various news, articles, stories, blogs and reviews text content typically organized by topics and different products tagged by categories and users can be classified on the basis on how they talk about particular brand or product on online web based platforms However, the majority of text classification blogs and tutorials on the internet can be found in the form of binary text classification whose common example include email classification such as email spam filtering (spam vs. ham), sentiment analysis (positive vs. negative) respectively. The Research has also identified the problem with two Roman Urdu words have same spelling but lexically they are different from each other such as common and mango spelled aam in Roman Urdu, but for quality training of model words need to maintain consistency in its use throughout the process (1,2) . For the analysis of textual data category, the most common and useful approach which plays an important role in the field of NLP like opinion mining, sentiment analysis, tweets, reviews, spam detection, email filtering is the common example of text categorization (3) .…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multi-text classification of Urdu/Roman using machine learning and natural language preprocessing techniques

lowast¹,

Khuhro²,

Kumar³

et al. 2020

IJST

View full text Add to dashboard Cite

show abstract

Section: Feature Extractionmentioning

confidence: 99%

Section: Feature Extractionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-text classification of Urdu/Roman using machine learning and natural language preprocessing techniques

lowast¹,

Khuhro²,

Kumar³

et al. 2020

IJST

View full text Add to dashboard Cite

show abstract

“…Most of the researchers have worked on SA using languages other than Urdu. Few researchers have worked in Urdu SA [21]. Urdu corpus and lexicon are developed by researchers [22][23][24][25].…”

Section: Related Workmentioning

confidence: 99%

Recognition and Effective Handling of Negations in Enhancing the Accuracy of Urdu Sentiment Analyzer

Mukhtar

Khan

Chiragh

et al. 2020

Mehran Univ. res. j. eng. technol.

View full text Add to dashboard Cite

Although work has been done in Urdu Sentiment Analysis by researchers but still there is a lot of room for improvement in the form of achieving higher accuracy. Therefore, in this research, the accuracy of Urdu Sentiment Analysis in multiple domains is enhanced by dealing negations using Lexicon-based approach, one of the broadly used approaches for performing Sentiment Analysis. Negations in Urdu Sentiment Analysis are particularly focused in this research because of their effective role in Sentiment Analysis. Both local and long distance negations are considered. For achieving this goal, a corpus with 6025 Urdu sentences, from 151 blogs that belong to 14 different genres is taken in which use of negations is carefully observed. Two major steps are taken in this regard. First, to deal with the morphological negations, this type of negations is included in the negative word file of the Urdu Sentiment Lexicon developed for performing Sentiment Analysis of Urdu blogs. Secondly, rule-based approach is used for handling the implicit and explicit negations. Rules are designed that can deal with both implicit and explicit negations effectively. Implementation of these rules increased the accuracy of Sentiment Analyzer from 73.88% to 78.32% with 0.745, 0.788 and 0.745 Precision, Recall and Fmeasure respectively, which is statistically significant improvement.

show abstract

“…A very limited sentiment analysis work exists for Roman Urdu which can be classified into lexicon based [18], machine learning, and deep learning based approaches [19], [20], [21], [22]. Lexicon based approaches have low applicability over unseen data, and machine learning based approaches predominantly use bag-of-words based feature representation approaches which face the problem of data sparsity.…”

Section: Introductionmentioning

confidence: 99%

A Precisely Xtreme-Multi Channel Hybrid Approach for Roman Urdu Sentiment Analysis

et al. 2020

View full text Add to dashboard Cite

In order to accelerate the performance of various Natural Language Processing tasks for Roman Urdu, this paper for the very first time provides 3 neural word embeddings prepared using most widely used approaches namely Word2vec, FastText, and Glove. The integrity of generated neural word embeddings is evaluated using intrinsic and extrinsic evaluation approaches. Considering the lack of publicly available benchmark datasets, it provides a first-ever Roman Urdu public dataset which consists of 3241 sentiments annotated against positive, negative, and neutral classes. To provide benchmark baseline performance over the presented dataset for Roman Urdu sentiment analysis, we adapt diverse machine learning (Support Vector Machine, Logistic Regression, Naive Bayes), deep learning (convolutional neural network, recurrent neural network), and hybrid deep learning approaches. Performance impact of generated neural word embeddings based representation is compared with other most widely used bag of words based feature representation approaches using diverse machine and deep learning classifiers. In order to improve the performance of Roman Urdu sentiment analysis, it proposes a novel precisely extreme multi-channel hybrid methodology which makes use of convolutional and recurrent neural networks along with pre-trained neural word embeddings. The proposed hybrid approach outperforms adapted machine learning approaches by the significant figure of 9% and deep learning approaches by the figure of 4% in terms of F1-score. INDEX TERMS Fast-Text, Glove, Pretrain word embeddings for Roman Urdu, Roman Urdu Sentiment Analysis, Word2Vec,

show abstract

Sentiment Analysis for Roman Urdu

Cited by 22 publications

References 8 publications

Multi-text classification of Urdu/Roman using machine learning and natural language preprocessing techniques

Multi-text classification of Urdu/Roman using machine learning and natural language preprocessing techniques

Recognition and Effective Handling of Negations in Enhancing the Accuracy of Urdu Sentiment Analyzer

A Precisely Xtreme-Multi Channel Hybrid Approach for Roman Urdu Sentiment Analysis

Contact Info

Product

Resources

About