2020
DOI: 10.48550/arxiv.2010.12547
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multilingual BERT Post-Pretraining Alignment

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 17 publications
0
7
0
Order By: Relevance
“…In section 2, we outlined the most conceptually similar methods that conducted large-scale model pretraining with task-agnostic parallel sentence alignment as part of the training routine (Hu et al, 2020a;Feng et al, 2020;Pan et al, 2020;Chi et al, 2020). Where ablation studies were provided, the average improvement attributed to contrastive alignment was ∼0.2-0.3 points (though the tasks were slightly different).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In section 2, we outlined the most conceptually similar methods that conducted large-scale model pretraining with task-agnostic parallel sentence alignment as part of the training routine (Hu et al, 2020a;Feng et al, 2020;Pan et al, 2020;Chi et al, 2020). Where ablation studies were provided, the average improvement attributed to contrastive alignment was ∼0.2-0.3 points (though the tasks were slightly different).…”
Section: Discussionmentioning
confidence: 99%
“…Contrastive alignment based on MoCo with two PXLM encoders was proposed by Pan et al (2020). Using an L2 normalised [CLS] token with a nonlinear projection as the input representation, the model was aligned on 250K to 2M parallel sentences with added Translation Language Modelling (TLM) and a code-switching augmentation.…”
Section: Groupsmentioning
confidence: 99%
“…Our contrastive alignment is based on InfoNCE (Oord et al, 2018). Previous work has employed a contrastive loss for cross-lingual alignment (Pan et al, 2020), however, the datasets were out-ofdomain and orders of magnitude larger. We show that strong results can be obtained using only indomain (fine-tuning) data.…”
Section: Contrastive Alignment For Xnlumentioning
confidence: 99%
“…Qi and Du (2020) include an adversarial language detector in training whose loss encourages the model to generate language-agnostic sentence representations for improved zero-shot transfer. Pan et al (2020) and Chi et al (2020) added a contrastive loss to pretraining that treats translated sentences as positive examples and unrelated sentences as negative samples. This training step helps the XLM produce similar embeddings in different languages.…”
Section: Introductionmentioning
confidence: 99%
“…InfoXLM (Chi et al, 2021b) considers an input translation pair as cross-lingual views of the same meaning, and proposes the cross-lingual contrastive learning task that aims to maximize the InfoNCE (Oord et al, 2018) lower bound of the mutual information of the two views. Contrastive learning is also used in Hictl and post-pretrained multilingual BERT (Pan et al, 2020). Several pre-training tasks utilize the token-level alignments in parallel data to improve cross-lingual language models (Cao et al, 2020;Zhao et al, 2020;Hu et al, 2020a;Chi et al, 2021c).…”
Section: Related Workmentioning
confidence: 99%