Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

Chi, Zewen; Dong, Li; Zheng, Bo; Huang, Shaohan; Mao, Xian-Ling; Huang, Heyan; Wei, Furu

doi:10.18653/v1/2021.acl-long.265

“…Pre-training Data Following (Chi et al, 2021), we use the combination of CCNet (Wenzek et al, 2019) and Wikipedia dump as pre-training corpora. We sample sentences in 94 languages from the corpora, and employ a re-balanced distribution introduced by Conneau and Lample (2019), which increases the probability of low-resource languages.…”

Section: Methodsmentioning

confidence: 99%

On the Representation Collapse of Sparse Mixture of Experts

Chi¹,

Liu²,

Huang³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse. In this work, we propose to estimate the routing scores between tokens and experts on a lowdimensional hypersphere. We conduct extensive experiments on cross-lingual language model pre-training and fine-tuning on downstream tasks. Experimental results across seven multilingual benchmarks show that our method achieves consistent gains. We also present a comprehensive analysis on the representation and routing behaviors of our models. Our method alleviates the representation collapse issue and achieves more consistent routing than the baseline mixture-of-experts methods.

show abstract

“…Recently, cross-lingual alignment objectives have been used to train multilingual contextual models from scratch (Hu et al, 2021;Chi et al, 2021), to align the outputs of monolingual models (Aldarmaki and Diab, 2019;Wang et al, 2019), or to apply a post-hoc alignment to a multilingual model after pre-training (Zhao et al, 2021;Cao et al, 2020;Wu and Dredze, 2020b;Kvapilíková et al, 2020;Ouyang et al, 2021;Alqahtani et al, 2021). These works typically use objectives that rely on translated or induced sentence pairs, such as translation language modelling (TLM; Lample and Conneau, 2019).…”

Section: Related Workmentioning

confidence: 99%

“…Gritta and Iacobacci (2021) use translated task data to encourage a task-specific alignment of XLM-R. Some use word-aligned corpora (e.g., Wang et al, 2019), while others use parallel sentences plus unsupervised word alignment (Alqahtani et al, 2021;Chi et al, 2021). Ouyang et al (2021) introduce backtranslation to the alignment process, but still use some parallel data.…”

Section: Related Workmentioning

confidence: 99%

Combining Static and Contextualised Multilingual Embeddings

Hämmerl¹,

Libovický²,

Fraser³

2022

Preprint

View full text Add to dashboard Cite

Static and contextual multilingual embeddings have complementary strengths. Static embeddings, while less expressive than contextual language models, can be more straightforwardly aligned across multiple languages. We combine the strengths of static and contextual models to improve multilingual representations. We extract static embeddings for 40 languages from XLM-R, validate those embeddings with cross-lingual word retrieval, and then align them using VecMap. This results in high-quality, highly multilingual static embeddings. Then we apply a novel continued pre-training approach to XLM-R, leveraging the high quality alignment of our static embeddings to better align the representation space of XLM-R. We show positive results for multiple complex semantic tasks. We release the static embeddings and the continued pretraining code. 1 Unlike most previous work, our continued pre-training approach does not require parallel text.

show abstract

“…Recently, cross-lingual alignment objectives have been used to train multilingual contextual models from scratch (Hu et al, 2021;Chi et al, 2021), to align the outputs of monolingual models (Aldarmaki and Diab, 2019;Wang et al, 2019), or to apply a post-hoc alignment to a multilingual model after pre-training (Zhao et al, 2021;Cao et al, 2020;Wu and Dredze, 2020b;Kvapilíková et al, 2020;Ouyang et al, 2021;Alqahtani et al, 2021). These works typically use objectives that rely on translated or induced sentence pairs, such as translation language modelling (TLM; Lample and Conneau, 2019).…”

Section: Related Workmentioning

confidence: 99%

Combining Static and Contextualised Multilingual Embeddings

Hämmerl¹,

Libovický²,

Fraser³

2022

Findings of the Association for Computational Linguistics: ACL 2022

4

0

View full text Add to dashboard Cite

Static and contextual multilingual embeddings have complementary strengths. Static embeddings, while less expressive than contextual language models, can be more straightforwardly aligned across multiple languages. We combine the strengths of static and contextual models to improve multilingual representations. We extract static embeddings for 40 languages from XLM-R, validate those embeddings with cross-lingual word retrieval, and then align them using VecMap. This results in high-quality, highly multilingual static embeddings. Then we apply a novel continued pre-training approach to XLM-R, leveraging the high quality alignment of our static embeddings to better align the representation space of XLM-R. We show positive results for multiple complex semantic tasks. We release the static embeddings and the continued pretraining code. 1 Unlike most previous work, our continued pre-training approach does not require parallel text.

show abstract

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

Cited by 25 publications

References 26 publications

On the Representation Collapse of Sparse Mixture of Experts

On the Representation Collapse of Sparse Mixture of Experts

Combining Static and Contextualised Multilingual Embeddings

Combining Static and Contextualised Multilingual Embeddings

Contact Info

Product

Resources

About