Proceedings of the Seventh Workshop on Noisy User-Generated Text (W-Nut 2021) 2021
DOI: 10.18653/v1/2021.wnut-1.47
|View full text |Cite
|
Sign up to set email alerts
|

Can Character-based Language Models Improve Downstream Task Performances In Low-Resource And Noisy Language Scenarios?

Abstract: Recent impressive improvements in NLP, largely based on the success of contextual neural language models, have been mostly demonstrated on at most a couple dozen highresource languages. Building language models and, more generally, NLP systems for nonstandardized and low-resource languages remains a challenging task. In this work, we focus on North-African colloquial dialectal Arabic written using an extension of the Latin script, called NArabizi, found mostly on social media and messaging communication. In th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 20 publications
1
4
0
Order By: Relevance
“…The DziriBERT model exhibits the best performance; however, CharacterBERT delivers competitive results while being trained on a mere 7.5% of the data used for training DziriBERT. This observation is consistent with the conclusions drawn by Riabi et al (2021).…”
Section: New Results For Udsupporting
confidence: 94%
See 2 more Smart Citations
“…The DziriBERT model exhibits the best performance; however, CharacterBERT delivers competitive results while being trained on a mere 7.5% of the data used for training DziriBERT. This observation is consistent with the conclusions drawn by Riabi et al (2021).…”
Section: New Results For Udsupporting
confidence: 94%
“…In Appendix A, we present the results of all our experiments using the CharacterBERT model trained by Riabi et al (2021). We observe a heterogeneous improvement in performance, with predominantly better outcomes for our CharacterBERT.…”
Section: Impact Of the Pre-training Corpusmentioning
confidence: 96%
See 1 more Smart Citation
“…Attia et al (2019) find that POS tags provide a strong signal for identifying code-switching. Just as code-switching is a major characteristic of AJA, it also characterizes other varieties of Algerian Arabic, and poses a challenge to Arabic NLP research (Riabi et al, 2021).…”
Section: Code-switchingmentioning
confidence: 99%
“…and wordplay-based tasks that require attention to character-level manipulations (Riabi et al, 2021;El Boukkouri, 2020;Clark et al, 2021).…”
Section: Introductionmentioning
confidence: 99%