MSTD: Moroccan Sentiment Twitter Dataset

Mihi, Soukaina; Ait, Brahim; El, Ismail; Arezki, Sara; Laachfoubi, Nabil

doi:10.14569/ijacsa.2020.0111045

Cited by 11 publications

(12 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…• Text classification (Abozinadah & Jones Jr, 2016;Abufayad, 2018;Ajlouni, 2021;AlBatayha, 2021;Habash, 2021;Mgheed, 2021). • Sentiment analysis (Al-Hagery et al, 2020;Alharbi et al, 2020;Al-Horaibi & Khan, 2016;Almutairi & Al-Hagery, 2021;Alotaibi et al, 2019;Kaibi et al, 2019Kaibi et al, , 2020Khabour et al, 2022;Mihi, Ali, et al, 2020;Mihi, Ait, et al, 2020;Mihi et al, 2022;Oussous et al, 2020). • Language model (Alzu'bi & Duwairi, 2021;Hamed et al, 2017).…”

Section: Statement Of Needunclassified

PyArabic: A Python package for Arabic text

Zerrouki¹

2023

JOSS

View full text Add to dashboard Cite

show abstract

Section: Statement Of Needunclassified

PyArabic: A Python package for Arabic text

Zerrouki¹

2023

JOSS

View full text Add to dashboard Cite

show abstract

“…This dataset consists of 12k tweets, which are labeled as Negative, Objective, Positive, or Sarcastic. To [32] 223k tokens from Darija and MSA blog posts No 76k tokens Voss et al [33] corpus of tweets of Moroccan dialect written in Roman script No Unknown Laoudi et al [34] 1836 Hespress news website comments No 1.8k sequences Maghfour et al [35] 10k Facebook comments labeled for sentiment analysis No 3.5k sequences MSTD [36] 12k facilitate the analysis, two data subsets were created, one with sentiment labels and the other with a binary label for sarcasm (refer to Tables 7 and 8 for the content description).…”

Section: Sentiment Analysis and Sarcasm Automatic Detectionmentioning

confidence: 99%

Darijabert: a Step Forward in Nlp for the Written Moroccan Dialect

Gaanoun

Naira

Allak

et al. 2023

Preprint

View full text Add to dashboard Cite

The performance of existing transformer-based language models in providing state-of-the-art results on many downstream tasks is well established. However, these models tend to be limited to high-resource languages or are multilingual in nature. The availability of models dedicated to Arabic dialects is limited, and even those that exist primarily support dialects written in Arabic script. This study presents the first BERT models for Moroccan Arabic dialect, also known as Darija, called DarijaBERT, DarijaBERT-arabizi, and DarijaBERT-mix. These models are trained on the largest Arabic monodialectal corpus, supporting both Arabic and Latin character representations of the Moroccan dialect. The models' performance is evaluated and compared to existing multidialectal and multilingual models on four distinct downstream tasks, demonstrating state-of-the-art results. The data collection methodology and pre-training process are described, and the Moroccan Topic Classification Dataset (MTCD) is introduced as the first dataset for topic classification in the Moroccan Arabic dialect. The pre-trained models and MTCD dataset are available to the scientific community.

show abstract

“…Moroccan sentiment Twitter dataset (MSTD) [39] is a Moroccan dataset retrieved from tweets covering four-way sentiment classification. We are interested in the binary dataset.…”

Section: Datasetsmentioning

confidence: 99%

Dialectal Arabic sentiment analysis based on tree-based pipeline optimization tool

Mihi¹,

Ali²,

Bazi

et al. 2022

IJECE

Self Cite

View full text Add to dashboard Cite

The heavy involvement of the Arabic internet users resulted in spreading data written in the Arabic language and creating a vast research area regarding natural language processing (NLP). Sentiment analysis is a growing field of research that is of great importance to everyone considering the high added potential for decision-making and predicting upcoming actions using the texts produced in social networks. Arabic used in microblogging websites, especially Twitter, is highly informal. It is not compliant with neither standards nor spelling regulations making it quite challenging for automatic machine-learning techniques. In this paper’s scope, we propose a new approach based on AutoML methods to improve the efficiency of the sentiment classification process for dialectal Arabic. This approach was validated through benchmarks testing on three different datasets that represent three vernacular forms of Arabic. The obtained results show that the presented framework has significantly increased accuracy than similar works in the literature.

show abstract

MSTD: Moroccan Sentiment Twitter Dataset

Cited by 11 publications

References 32 publications

PyArabic: A Python package for Arabic text

PyArabic: A Python package for Arabic text

Darijabert: a Step Forward in Nlp for the Written Moroccan Dialect

Dialectal Arabic sentiment analysis based on tree-based pipeline optimization tool

Contact Info

Product

Resources

About