Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching 2021
DOI: 10.18653/v1/2021.calcs-1.7
|View full text |Cite
|
Sign up to set email alerts
|

CoMeT: Towards Code-Mixed Translation Using Parallel Monolingual Sentences

Abstract: Code-mixed languages are very popular in multilingual societies around the world, yet the resources lag behind to enable robust systems on such languages. A major contributing factor is the informal nature of these languages which makes it difficult to collect codemixed data. In this paper, we propose our system for Task 1 of CACLS 2021 1 to generate a machine translation system for English to Hinglish in a supervised setting. Translating in the given direction can help expand the set of resources for several … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 24 publications
(13 citation statements)
references
References 26 publications
0
13
0
Order By: Relevance
“…• LTRC-PreCog (Gautam et al, 2021). They propose to use mBART, a pre-trained multilingual sequence-to-sequence model and fully utilize the pre-training of the model by transliterating the roman Hindi words in the codemixed sentences to Devanagri script.…”
Section: Methodsmentioning
confidence: 99%
“…• LTRC-PreCog (Gautam et al, 2021). They propose to use mBART, a pre-trained multilingual sequence-to-sequence model and fully utilize the pre-training of the model by transliterating the roman Hindi words in the codemixed sentences to Devanagri script.…”
Section: Methodsmentioning
confidence: 99%
“…Code-switching in NLP has seen a rise of interest in recent years, including a dedicated workshop starting in 2014 (Diab et al, 2014) and still ongoing (Solorio et al, 2021). CS in machine translation also has a long history (Le Féal, 1990;Climent et al, 2003;Sinha and Thakur, 2005;Johnson et al, 2017;Elmadany et al, 2021;Xu and Yvon, 2021), but has seen a rise of interest with the advent of large multilingual models such as mBART (Liu et al, 2020) or mT5 (Xue et al, 2020;Gautam et al, 2021;Jawahar et al, 2021). Due to the lack of available CS data and the ease of single-word translation, most of these recent related MT works have synthetically created CS data for either training or testing by translating one or more of the words in a sentence (Song et al, 2019;Nakayama et al, 2019;Xu and Yvon, 2021;.…”
Section: Related Workmentioning
confidence: 99%
“…Code-switching in NLP has seen a rise of interest in recent years, including a dedicated workshop starting in 2014 (Diab et al, 2014) and still ongoing (Solorio et al, 2021). CS in machine translation also has a long history (Le Féal, 1990;Climent et al, 2003;Sinha and Thakur, 2005;Johnson et al, 2017;Elmadany et al, 2021;Xu and Yvon, 2021), but has seen a rise of interest with the advent of large multilingual models such as mBART (Liu et al, 2020) or mT5 (Xue et al, 2020;Gautam et al, 2021;Jawahar et al, 2021). Due to the lack of available CS data and the ease of single-word translation, most of these recent related MT works have synthetically created CS data for either training or testing by translating one or more of the words in a sentence (Song et al, 2019;Nakayama et al, 2019;Xu and Yvon, 2021;.…”
Section: Related Workmentioning
confidence: 99%