ASTD: Arabic Sentiment Tweets Dataset

Nabil, Mahmoud; Aly, Mohamed; Atiya, Amir F.

doi:10.18653/v1/d15-1299

Cited by 283 publications

(225 citation statements)

References 4 publications

Supporting

Mentioning

197

Contrasting

Unclassified

Order By: Relevance

“…Our data collection method is most similar to Abdul-Mageed et al (2016), who also use phrase seeds to acquire tweets for Ekman's 6 basic emotions, but we extend the work to 8 emotions, expand the list of seed expressions used, improve on the manual annotation study, and empirically validate the method on the practical emotion modeling task both on our data and on an external dataset. Our work also has affinity to works on Arabic text classification (Abdul-Mageed et al, 2011;Refaee and Rieser, 2014;Abdul-Mageed et al, 2014;Nabil et al, 2015;Salameh et al, 2015;Abdul-Mageed, 2017, 2018Alshehri et al, 2018;), but we focus on emotion.…”

Section: Related Workmentioning

confidence: 99%

Enabling Deep Learning of Emotion With First-Person Seed Expressions

Alhuzali¹,

Abdul-Mageed²,

Ungar³

2018

Proceedings of the Second Workshop on Computational Modeling Of People’s Opinions, Personality, and Emotions in Socia

View full text Add to dashboard Cite

The computational treatment of emotion in natural language text remains relatively limited, and Arabic is no exception. This is partly due to lack of labeled data. In this work, we describe and manually validate a method for the automatic acquisition of emotion labeled data and introduce a newly developed data set for Modern Standard and Dialectal Arabic emotion detection focused at Robert Plutchik's 8 basic emotion types. Using a hybrid supervision method that exploits first person emotion seeds, we show how we can acquire promising results with a deep gated recurrent neural network. Our best model reaches 70% Fscore, significantly (i.e., 11%, p < 0.05) outperforming a competitive baseline. Applying our method and data on an external dataset of 4 emotions released around the same time we finalized our work, we acquire 7% absolute gain in F-score over a linear SVM classifier trained on gold data, thus validating our approach.

show abstract

Section: Related Workmentioning

confidence: 99%

Enabling Deep Learning of Emotion With First-Person Seed Expressions

Alhuzali¹,

Abdul-Mageed²,

Ungar³

2018

Proceedings of the Second Workshop on Computational Modeling Of People’s Opinions, Personality, and Emotions in Socia

View full text Add to dashboard Cite

show abstract

“…The researchers on [33] and [16] used the manual annotation method with the help of tools that were used to facilitate the annotation process and to reduce time and workload for the annotators. The first research used the Amazon Mechanical Turk service using an API called Boto, to annotate the dataset manually [34].…”

Section: Annotation Processmentioning

confidence: 99%

“…Manually on a sentence level 21 [30], [21], [23], [10], [11], [31], [28], [29], [2], [15], [14], [20], [8], [13], [35], [26], [33], [16], [27], [18], [12].…”

Section: Annotation Process Type Paper Count Paperunclassified

“…The number of annotators also differ between researchs, out of the 27 papers under consideration, nine of which depend on a total of three native Arabic speakers to annotate their dataset [30], [23], [17] [10], [33], [28], [29], [18], [25]. Authors in [27] annotated their dataset with the help of five people.…”

Section: Annotation Process Type Paper Count Papermentioning

confidence: 99%

“…In the research carried out by [17] the verification process was not mentioned, however, data has to be annotated by at least three users of their annotation tool and the user can delete any empty data, duplicated data or any data written using English letters. In [33], they used a public tool for sets of three annotators, but if there was a conflict between them in annotating an observation, this observation is ignored. All papers that annotated their corpora based on the review rating, as in number or stars or points, like [11], [8] and [35] did not mention the method of verification.…”

Section: Five Annotatorsmentioning

confidence: 99%

See 2 more Smart Citations

Social Computing and Social Media. Applications and Analytics

Meiselwitz¹

2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Mining publicly available data for meaning and value is an important research direction within social media analysis. To automatically analyze collected textual data, a manual effort is needed for a successful machine learning algorithm to effectively classify text. This pertains to annotating the text adding labels to each data entry. Arabic is one of the languages that are growing rapidly in the research of sentiment analysis, despite limited resources and scares annotated corpora. In this paper, we review the annotation process carried out by those papers. A total of 27 papers were reviewed between the years of 2010 and 2016.

show abstract

A hybrid neural network model based on transfer learning for Arabic sentiment analysis of customer satisfaction

Bakhit,

Nderu,

Ngunyi

2024

Engineering Reports

View full text Add to dashboard Cite

Sentiment analysis, a method used to classify textual content into positive, negative, or neutral sentiments, is commonly applied to data from social media platforms. Arabic, an official language of the United Nations, presents unique challenges for sentiment analysis due to its complex morphology and dialectal diversity. Compared to English, research on Arabic sentiment analysis is relatively scarce. Transfer learning, which applies the knowledge learned from one domain to another, can address the limitations of training time and computational resources. However, the development of transfer learning for Arabic sentiment analysis is still underdeveloped. In this study, we develop a new hybrid model, RNN‐BiLSTM, which merges recurrent neural networks (RNN) and bidirectional long short‐term memory (BiLSTM) networks. We used Arabic bidirectional encoder representations from transformers (AraBERT), a state‐of‐the‐art Arabic language pre‐trained transformer‐based model, to generate word‐embedding vectors. The RNN‐BiLSTM model integrates the strengths of RNN and BiLSTM, including the ability to learn sequential dependencies and bidirectional context. We trained the RNN‐BiLSTM model on the source domain, specifically the Arabic reviews dataset (ARD). The RNN‐BiLSTM model outperforms the RNN and BiLSTM models with default parameters, achieving an accuracy of 95.75%. We further applied transfer learning to the RNN‐BiLSTM model by fine‐tuning its parameters using random search. We compared the performance of the fine‐tuned RNN‐BiLSTM model with the RNN and BiLSTM models on two target domain datasets: ASTD and Aracust. The results showed that the fine‐tuned RNN‐BiLSTM model is more effective for transfer learning, achieving an accuracy of 95.44% and 96.19% on the ASTD and Aracust datasets, respectively.

show abstract

ASTD: Arabic Sentiment Tweets Dataset

Cited by 283 publications

References 4 publications

Enabling Deep Learning of Emotion With First-Person Seed Expressions

Enabling Deep Learning of Emotion With First-Person Seed Expressions

Social Computing and Social Media. Applications and Analytics

A hybrid neural network model based on transfer learning for Arabic sentiment analysis of customer satisfaction

Contact Info

Product

Resources

About