Multilingual BERT Post-Pretraining Alignment

Pan, Lin; Hang, Chung-Wei; Qi, Haode; Shah, Abhishek; Yu, Mo; Potdar, Saloni

doi:10.48550/arxiv.2010.12547

Cited by 5 publications

(7 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In section 2, we outlined the most conceptually similar methods that conducted large-scale model pretraining with task-agnostic parallel sentence alignment as part of the training routine (Hu et al, 2020a;Feng et al, 2020;Pan et al, 2020;Chi et al, 2020). Where ablation studies were provided, the average improvement attributed to contrastive alignment was ∼0.2-0.3 points (though the tasks were slightly different).…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

XeroAlign: Zero-shot cross-lingual transformer alignment

Gritta¹,

Iacobacci²

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

The introduction of transformer-based crosslingual language models brought decisive improvements to multilingual NLP tasks. However, the lack of labelled data has necessitated a variety of methods that aim to close the gap to high-resource languages. Zero-shot methods in particular, often use translated task data as a training signal to bridge the performance gap between the source and target language(s). We introduce XeroAlign, a simple method for taskspecific alignment of cross-lingual pretrained transformers such as XLM-R. XeroAlign uses translated task data to encourage the model to generate similar sentence embeddings for different languages. The XeroAligned XLM-R, called XLM-RA, shows strong improvements over the baseline models to achieve state-ofthe-art zero-shot results on three multilingual natural language understanding tasks. XLM-RA performs on par with state-of-the-art models on a cross-lingual adversarial paraphrasing task and its text classification accuracy exceeds that of XLM-R trained with labelled data.

show abstract

Section: Discussionmentioning

confidence: 99%

“…Contrastive alignment based on MoCo with two PXLM encoders was proposed by Pan et al (2020). Using an L2 normalised [CLS] token with a nonlinear projection as the input representation, the model was aligned on 250K to 2M parallel sentences with added Translation Language Modelling (TLM) and a code-switching augmentation.…”

Section: Groupsmentioning

confidence: 99%

XeroAlign: Zero-shot cross-lingual transformer alignment

Gritta¹,

Iacobacci²

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

show abstract

“…Our contrastive alignment is based on InfoNCE (Oord et al, 2018). Previous work has employed a contrastive loss for cross-lingual alignment (Pan et al, 2020), however, the datasets were out-ofdomain and orders of magnitude larger. We show that strong results can be obtained using only indomain (fine-tuning) data.…”

Section: Contrastive Alignment For Xnlumentioning

confidence: 99%

“…Qi and Du (2020) include an adversarial language detector in training whose loss encourages the model to generate language-agnostic sentence representations for improved zero-shot transfer. Pan et al (2020) and Chi et al (2020) added a contrastive loss to pretraining that treats translated sentences as positive examples and unrelated sentences as negative samples. This training step helps the XLM produce similar embeddings in different languages.…”

Section: Introductionmentioning

confidence: 99%

CrossAligner & Co: Zero-Shot Transfer Methods for Task-Oriented Cross-lingual Natural Language Understanding

Gritta¹,

Hu²,

Iacobacci³

2022

Preprint

View full text Add to dashboard Cite

Task-oriented personal assistants enable people to interact with a host of devices and services using natural language. One of the challenges of making neural dialogue systems available to more users is the lack of training data for all but a few languages. Zero-shot methods try to solve this issue by acquiring task knowledge in a high-resource language such as English with the aim of transferring it to the low-resource language(s). To this end, we introduce CrossAligner, the principal method of a variety of effective approaches for zero-shot cross-lingual transfer based on learning alignment from unlabelled parallel data. We present a quantitative analysis of individual methods as well as their weighted combinations, several of which exceed state-of-theart (SOTA) scores as evaluated across nine languages, fifteen test sets and three benchmark multilingual datasets. A detailed qualitative error analysis of the best methods shows that our fine-tuned language models can zero-shot transfer the task knowledge better than anticipated. * Work conducted as Research Intern at Huawei's Noah's Ark Lab, London. † -Equal contribution.1 Not to be confused with Lample and Conneau (2019).

show abstract

“…InfoXLM (Chi et al, 2021b) considers an input translation pair as cross-lingual views of the same meaning, and proposes the cross-lingual contrastive learning task that aims to maximize the InfoNCE (Oord et al, 2018) lower bound of the mutual information of the two views. Contrastive learning is also used in Hictl and post-pretrained multilingual BERT (Pan et al, 2020). Several pre-training tasks utilize the token-level alignments in parallel data to improve cross-lingual language models (Cao et al, 2020;Zhao et al, 2020;Hu et al, 2020a;Chi et al, 2021c).…”

Section: Related Workmentioning

confidence: 99%

XLM-E: Cross-lingual Language Model Pre-training via ELECTRA

Chi¹,

Huang²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we introduce ELECTRA-style tasks (Clark et al., 2020b) to cross-lingual language model pre-training. Specifically, we present two pre-training tasks, namely multilingual replaced token detection, and translation replaced token detection. Besides, we pretrain the model, named as XLM-E, on both multilingual and parallel corpora. Our model outperforms the baseline models on various cross-lingual understanding tasks with much less computation cost. Moreover, analysis shows that XLM-E tends to obtain better cross-lingual transferability.

show abstract

Multilingual BERT Post-Pretraining Alignment

Cited by 5 publications

References 17 publications

XeroAlign: Zero-shot cross-lingual transformer alignment

XeroAlign: Zero-shot cross-lingual transformer alignment

CrossAligner & Co: Zero-Shot Transfer Methods for Task-Oriented Cross-lingual Natural Language Understanding

XLM-E: Cross-lingual Language Model Pre-training via ELECTRA

Contact Info

Product

Resources

About