Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.265
|View full text |Cite
|
Sign up to set email alerts
|

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

Abstract: The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences. In this paper, we introduce denoising word alignment as a new cross-lingual pre-training task. Specifically, the model first self-labels word alignments for parallel sentences. Then we randomly mask tokens in a bitext pair. Given a masked token, the model uses a pointer network to predict the aligned token in the other language. We alternately perform the above two steps in an ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 25 publications
(18 citation statements)
references
References 26 publications
0
10
0
Order By: Relevance
“…Pre-training Data Following (Chi et al, 2021), we use the combination of CCNet (Wenzek et al, 2019) and Wikipedia dump as pre-training corpora. We sample sentences in 94 languages from the corpora, and employ a re-balanced distribution introduced by Conneau and Lample (2019), which increases the probability of low-resource languages.…”
Section: Methodsmentioning
confidence: 99%
“…Pre-training Data Following (Chi et al, 2021), we use the combination of CCNet (Wenzek et al, 2019) and Wikipedia dump as pre-training corpora. We sample sentences in 94 languages from the corpora, and employ a re-balanced distribution introduced by Conneau and Lample (2019), which increases the probability of low-resource languages.…”
Section: Methodsmentioning
confidence: 99%
“…Recently, cross-lingual alignment objectives have been used to train multilingual contextual models from scratch (Hu et al, 2021;Chi et al, 2021), to align the outputs of monolingual models (Aldarmaki and Diab, 2019;Wang et al, 2019), or to apply a post-hoc alignment to a multilingual model after pre-training (Zhao et al, 2021;Cao et al, 2020;Wu and Dredze, 2020b;Kvapilíková et al, 2020;Ouyang et al, 2021;Alqahtani et al, 2021). These works typically use objectives that rely on translated or induced sentence pairs, such as translation language modelling (TLM; Lample and Conneau, 2019).…”
Section: Related Workmentioning
confidence: 99%
“…Gritta and Iacobacci (2021) use translated task data to encourage a task-specific alignment of XLM-R. Some use word-aligned corpora (e.g., Wang et al, 2019), while others use parallel sentences plus unsupervised word alignment (Alqahtani et al, 2021;Chi et al, 2021). Ouyang et al (2021) introduce backtranslation to the alignment process, but still use some parallel data.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, cross-lingual alignment objectives have been used to train multilingual contextual models from scratch (Hu et al, 2021;Chi et al, 2021), to align the outputs of monolingual models (Aldarmaki and Diab, 2019;Wang et al, 2019), or to apply a post-hoc alignment to a multilingual model after pre-training (Zhao et al, 2021;Cao et al, 2020;Wu and Dredze, 2020b;Kvapilíková et al, 2020;Ouyang et al, 2021;Alqahtani et al, 2021). These works typically use objectives that rely on translated or induced sentence pairs, such as translation language modelling (TLM; Lample and Conneau, 2019).…”
Section: Related Workmentioning
confidence: 99%