Domain Adaptation for Arabic Cross-Domain and Cross-Dialect Sentiment Analysis from Contextualized Word Embedding

Mekki, Abdellah El; Mahdaouy, Abdelkader El; Berrada, Ismaïl; Khoumsi, Ahmed

doi:10.18653/v1/2021.naacl-main.226

Cited by 20 publications

(10 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For domain-specific data, domain adaptive finetuning of existing PLMs using MLM or domain adaptation have been shown to improve the performance of NLP applications (Rietzler et al, 2020;Barbieri et al, 2021;El Mekki et al, 2021a). Nevertheless, when the domain-specific data is sufficiently large, these transformers can be trained from scratch (Abdul-Mageed et al, 2021;Inoue et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

CS-UM6P at SemEval-2022 Task 6: Transformer-based Models for Intended Sarcasm Detection in English and Arabic

Mahdaouy¹,

Mekki²,

Essefar³

et al. 2022

Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Self Cite

View full text Add to dashboard Cite

Sarcasm is a form of figurative language where the intended meaning of a sentence differs from its literal meaning. This poses a serious challenge to several Natural Language Processing (NLP) applications such as Sentiment Analysis, Opinion Mining, and Author Profiling. In this paper, we present our participating system to the intended sarcasm detection task in English and Arabic languages. Our system 1 consists of three deep learning-based models leveraging two existing pre-trained language models for Arabic and English. We have participated in all sub-tasks. Our official submissions achieve the best performance on sub-task A for Arabic language and rank second in sub-task B. For sub-task C, our system is ranked 7th and 11th on Arabic and English datasets, respectively.

show abstract

Section: Related Workmentioning

confidence: 99%

CS-UM6P at SemEval-2022 Task 6: Transformer-based Models for Intended Sarcasm Detection in English and Arabic

Mahdaouy¹,

Mekki²,

Essefar³

et al. 2022

Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this work published in 2021, the authors proposed an unsupervised approach domain adaptation for Arabic Cross-Dialect and Cross-Domain based on the Word Embedding technique [10]. During the experimental phase, they adopted the fine-grained and coarse-grained taxonomies of Arabic dialects.…”

Section: Word Embeddingmentioning

confidence: 99%

Untitled

2022

INDJCSE

View full text Add to dashboard Cite

Sentiment analysis is a field of research that consists in analyzing the sensations, attitudes, and emotions of individuals towards entities such as products, services, and economic organizations. Likewise, the demand for Arabic Sentiment Analysis has grown rapidly due to the extensive use of the Arabic language in social media networks and has generated considerable interest from the research community. Arabic is one of the widely used languages on social networks. However, its morphological complexities, its dialect varieties, and its relatively few resources make it a challenging language for sentiment analysis. The main goal of our study is to implement and compare the performance of ASA by exploiting machine learning and deep learning models to automatically determine the sentiments by classifying them as positive or negative. For that, this study implements and evaluates a deep learning model namely the long short-term memory (LSTM) model, and three machine learning algorithms: Support vector machines (SVM), Logistic Regression (LR), K-Nearest neighbours (KNN). These classifiers are applied on the Arabic-Review (ARev) database that is manually annotated and collected from many Arabic resources. The results show that SVM and LR models are the best performing classifiers with an accuracy of 92% and 93% respectively.

show abstract

“…To take profit from the provided unlabeled dataset in this shared task, we generate a weakly-annotated dataset and re-train the developed model on it. This method has been applied differently in several works (Khalifa et al, 2021;El Mekki et al, 2021a;Huang et al, 2021). In our work, we apply the following pipeline:…”

Section: Self-trainingmentioning

confidence: 99%

UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer

Mekki¹,

Mahdaouy²,

Akallouch³

et al. 2022

Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Self Cite

View full text Add to dashboard Cite

Building real-world complex Named Entity Recognition (NER) systems is a challenging task. This is due to the complexity and ambiguity of named entities that appear in various contexts such as short input sentences, emerging entities, and complex entities. Besides, real-world queries are mostly malformed, as they can be code-mixed or multilingual, among other scenarios. In this paper, we introduce our submitted system to the Multilingual Complex Named Entity Recognition (MultiCoNER) shared task. We approach the complex NER for multilingual and code-mixed queries, by relying on the contextualized representation provided by the multilingual Transformer XLM-RoBERTa. In addition to the CRF-based token classification layer, we incorporate a span classification loss to recognize named entities spans. Furthermore, we use a self-training mechanism to generate weakly-annotated data from a large unlabeled dataset. Our proposed system is ranked 6th and 8th in the multilingual and code-mixed MultiCoNER's tracks respectively.

show abstract

Domain Adaptation for Arabic Cross-Domain and Cross-Dialect Sentiment Analysis from Contextualized Word Embedding

Cited by 20 publications

References 36 publications

CS-UM6P at SemEval-2022 Task 6: Transformer-based Models for Intended Sarcasm Detection in English and Arabic

CS-UM6P at SemEval-2022 Task 6: Transformer-based Models for Intended Sarcasm Detection in English and Arabic

Untitled

UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer

Contact Info

Product

Resources

About