2022
DOI: 10.48550/arxiv.2202.09625
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CALCS 2021 Shared Task: Machine Translation for Code-Switched Data

Abstract: To date, efforts in the code-switching literature have focused for the most part on language identification, POS, NER, and syntactic parsing. In this paper, we address machine translation for code-switched social media data. We create a community shared task. We provide two modalities for participation: supervised and unsupervised. For the supervised setting, participants are challenged to translate English into Hindi-English (Eng-Hinglish) in a single direction. For the unsupervised setting, we provide the fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 20 publications
(26 reference statements)
0
0
0
Order By: Relevance
“…As for work on CS MT, there are many efforts (Sinha and Thakur, 2005;Dhar et al, 2018;Mahata et al, 2019;Menacer et al, 2019;Song et al, 2019;Tarunesh et al, 2021;Xu and Yvon, 2021;Chen et al, 2022;Hamed et al, 2022c). To the best of our knowledge, none of these efforts presented an extensive comparison covering different segmentation techniques.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…As for work on CS MT, there are many efforts (Sinha and Thakur, 2005;Dhar et al, 2018;Mahata et al, 2019;Menacer et al, 2019;Song et al, 2019;Tarunesh et al, 2021;Xu and Yvon, 2021;Chen et al, 2022;Hamed et al, 2022c). To the best of our knowledge, none of these efforts presented an extensive comparison covering different segmentation techniques.…”
Section: Related Workmentioning
confidence: 99%
“…We identify three main challenges for CS MT. First is data sparsity, a challenge common to many CS language pairs because of limited parallel corpora containing commissioned translations of CS text (Çetinoglu et al, 2016;Srivastava and Singh, 2020;Tarunesh et al, 2021;Hamed et al, 2022b;Chen et al, 2022). Second is Egyptian Arabic morphological richness, which further exacerbates the data sparsity situation (Habash et al, 2012a,b).…”
Section: Introductionmentioning
confidence: 99%
“…Significant research efforts have been dedicated to various code-switched tasks in the field of Natural Language Processing (NLP), such as Language Identification, Named Entity Recognition (NER), POS Tagging, Sentiment Analysis, Question Answering, and Natural Language Inference (NLI) (Khanuja et al, 2020;Jose et al, 2020;Chen et al, 2022;Rizwan et al, 2020). However, there has been limited exploration in the domain of propaganda detection, particularly for low-resource languages.…”
Section: Introductionmentioning
confidence: 99%