Mawdoo3 AI at MADAR Shared Task: Arabic Tweet Dialect Identification

Talafha, Bashar; Farhan, Wael; Altakrouri, Ahmed; Al-Natsheh, Hussein T.

doi:10.18653/v1/w19-4629

Cited by 11 publications

(8 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The outcome demonstrates the BilLex application's effectiveness in obtaining the cross-lingual equivalents of words and sentences in other languages (Shi et al 2019). As part of the Multi-Arabic Dialect Applications and Resources (MADAR) shared challenge, LSTM with fastText predicts the Arabic dialect from a collection of Arabic tweets with an accuracy of 50.59% (Talafha et al 2019). Urdu is a low-resource language that needs a framework for interpretable subject modeling.…”

Section: Topic Modellingmentioning

confidence: 94%

Impact of word embedding models on text analytics in deep learning environment: a review

2023

View full text Add to dashboard Cite

The selection of word embedding and deep learning models for better outcomes is vital. Word embeddings are an n-dimensional distributed representation of a text that attempts to capture the meanings of the words. Deep learning models utilize multiple computing layers to learn hierarchical representations of data. The word embedding technique represented by deep learning has received much attention. It is used in various natural language processing (NLP) applications, such as text classification, sentiment analysis, named entity recognition, topic modeling, etc. This paper reviews the representative methods of the most prominent word embedding and deep learning models. It presents an overview of recent research trends in NLP and a detailed understanding of how to use these models to achieve efficient results on text analytics tasks. The review summarizes, contrasts, and compares numerous word embedding and deep learning models and includes a list of prominent datasets, tools, APIs, and popular publications. A reference for selecting a suitable word embedding and deep learning approach is presented based on a comparative analysis of different techniques to perform text analytics tasks. This paper can serve as a quick reference for learning the basics, benefits, and challenges of various word representation approaches and deep learning models, with their application to text analytics and a future outlook on research. It can be concluded from the findings of this study that domain-specific word embedding and the long short term memory model can be employed to improve overall text analytics task performance.

show abstract

Section: Topic Modellingmentioning

confidence: 94%

Impact of word embedding models on text analytics in deep learning environment: a review

2023

View full text Add to dashboard Cite

show abstract

“…However, the field suffers from fragmented and independent works on different corpora that vary in terms of granularity, size and domain, making it challenging to track the progress of the solutions. Early work focused on binary dialect classification by discriminating one dialect from MSA (Elfardy and Diab, 2013;Tillmann et al, 2014), as well as identifying Arabic dialects at both a region-level Callison-Burch, 2011, 2014;Cotterell and Callison-Burch, 2014) and a country-level (Talafha et al, 2020;Abdelali et al, 2021;AlKhamissi et al, 2021).…”

Section: Previous Workmentioning

confidence: 99%

Arabic dialect identification: An in-depth error analysis on the MADAR parallel corpus

Olsen,

Touileb,

Velldal

2023

Proceedings of ArabicNLP 2023

View full text Add to dashboard Cite

This paper provides a systematic analysis and comparison of the performance of state-of-theart models on the task of fine-grained Arabic dialect identification using the MADAR parallel corpus. We test approaches based on pretrained transformer language models in addition to Naive Bayes models with a rich set of various features. Through a comprehensive data-and error analysis, we provide valuable insights into the strengths and weaknesses of both approaches. We discuss which dialects are more challenging to differentiate, and identify potential sources of errors. Our analysis reveals an important problem with identical sentences across dialect classes in the test set of the MADAR-26 corpus, which may confuse any classifier. We also show that none of the tested approaches captures the subtle distinctions between closely related dialects.

show abstract

“…However, it does have its limitation such as the lack of support for the various Arabic dialects. To address this, one might benefit from existing multi-dialect parallel datasets [49][50][51][52][53][54] or build new ones (perhaps, by benefiting from unsupervised approaches for dialect translation [7]). Another issue that can be addressed before adopting ATAR in real-life scenarios is trying to increase the model's accuracy.…”

Section: Experiments and Evaluationmentioning

confidence: 99%

Atar: Attention-based LSTM for Arabizi transliteration

Talafha

Abuammar

Al‐Ayyoub

2021

IJECE

Self Cite

View full text Add to dashboard Cite

A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present Atar, an attention-based encoder-decoder model for Arabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49).

show abstract

Mawdoo3 AI at MADAR Shared Task: Arabic Tweet Dialect Identification

Cited by 11 publications

References 10 publications

Impact of word embedding models on text analytics in deep learning environment: a review

Impact of word embedding models on text analytics in deep learning environment: a review

Arabic dialect identification: An in-depth error analysis on the MADAR parallel corpus

Atar: Attention-based LSTM for Arabizi transliteration

Contact Info

Product

Resources

About