A Call for More Rigor in Unsupervised Cross-lingual Learning

Artetxe, Mikel; Ruder, Sebastian; Yogatama, Dani; Labaka, Gorka; Agirre, Eneko

doi:10.18653/v1/2020.acl-main.658

Cited by 33 publications

(28 citation statements)

References 61 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In cross-lingual learning the feature space drastically changes, as alphabets, vocabularies and word order can be different. It can be seen as extreme adaptation scenario, for which parallel data may exist and can be used to build multilingual representations Artetxe et al, 2020). Second, instead of adapting to a particular target, there is some work on domain generalization aimed at building a single system which is robust on several known target domains.…”

Section: Domain Adaptation and Transfer Learning: Notationmentioning

confidence: 99%

Neural Unsupervised Domain Adaptation in NLP—A Survey

Ramponi¹,

Plank²

2020

Proceedings of the 28th International Conference on Computational Linguistics

148

108

View full text Add to dashboard Cite

Deep neural networks excel at learning from labeled data and achieve state-of-the-art results on a wide array of Natural Language Processing tasks. In contrast, learning from unlabeled data, especially under domain shift, remains a challenge. Motivated by the latest advances, in this survey we review neural unsupervised domain adaptation techniques which do not require labeled target domain data. This is a more challenging yet a more widely applicable setup. We outline methods, from early traditional non-neural methods to pre-trained model transfer. We also revisit the notion of domain, and we uncover a bias in the type of Natural Language Processing tasks which received most attention. Lastly, we outline future directions, particularly the broader need for out-of-distribution generalization of future NLP. 1

show abstract

Section: Domain Adaptation and Transfer Learning: Notationmentioning

confidence: 99%

Neural Unsupervised Domain Adaptation in NLP—A Survey

Ramponi¹,

Plank²

2020

Proceedings of the 28th International Conference on Computational Linguistics

148

108

View full text Add to dashboard Cite

show abstract

“…While not being the first, BERT [1] has arguably become the most successful in generating contextualized representations, leading to a new research field termed BERTology with hundreds of publications [20]. However, the success is largely centered around English and few other high-resource languages [21], limiting the use of this technology in most of the world's languages [14].…”

Section: Related Workmentioning

confidence: 99%

“…This is based on the assumption that there are none or few labeled examples in L T while unlabeled data is abundant. However, it has been suggested that this assumption is neither realistic nor practical [14,15].…”

Section: Transfer Learning As Post-trainingmentioning

confidence: 99%

“…In the multilingual PLM literature, transfer learning mostly focuses on zero-shot transfer, which assumes that there is no labeled data available in the target language. This is, however, unrealistic, as in real-world scenarios there are in fact labeled data available in many cases [14,15]. Furthermore, zero-shot scenarios force the model to maintain its performance in the source language, which might prevent the model from fully adapting to the target language.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models

Lee

Yang

Whang³

et al. 2021

Applied Sciences

View full text Add to dashboard Cite

Language model pretraining is an effective method for improving the performance of downstream natural language processing tasks. Even though language modeling is unsupervised and thus collecting data for it is relatively less expensive, it is still a challenging process for languages with limited resources. This results in great technological disparity between high- and low-resource languages for numerous downstream natural language processing tasks. In this paper, we aim to make this technology more accessible by enabling data efficient training of pretrained language models. It is achieved by formulating language modeling of low-resource languages as a domain adaptation task using transformer-based language models pretrained on corpora of high-resource languages. Our novel cross-lingual post-training approach selectively reuses parameters of the language model trained on a high-resource language and post-trains them while learning language-specific parameters in the low-resource language. We also propose implicit translation layers that can learn linguistic differences between languages at a sequence level. To evaluate our method, we post-train a RoBERTa model pretrained in English and conduct a case study for the Korean language. Quantitative results from intrinsic and extrinsic evaluations show that our method outperforms several massively multilingual and monolingual pretrained language models in most settings and improves the data efficiency by a factor of up to 32 compared to monolingual training.

show abstract

“…Low-resource languages sometimes leave no choice but to use unsupervised models. In [3], the authors argue that strict unsupervised training without any parallel data is rather impractical. Nevertheless, they acknowledge theoretical scientific value of further research in this direction.…”

Section: Related Workmentioning

confidence: 99%

Semantic Recommendation System for Bilingual Corpus of Academic Papers

Safaryan

Filchenkov

Yan

et al. 2021

Communications in Computer and Information Science

View full text Add to dashboard Cite

We tested four methods of making document representations cross-lingual for the task of semantic search for the similar papers based on the corpus of papers from three Russian conferences on NLP: Dialogue, AIST and AINL. The pipeline consisted of three stages: preprocessing, word-byword vectorisation using models obtained with various methods to map vectors from two independent vector spaces to a common one, and search for the most similar papers based on the cosine similarity of text vectors. The four methods used can be grouped into two approaches: 1) aligning two pretrained monolingual word embedding models with a bilingual dictionary on our own (for example, with the VecMap algorithm) and 2) using pre-aligned cross-lingual word embedding models (MUSE). To find out, which approach brings more benefit to the task, we conducted a manual evaluation of the results and calculated the average precision of recommendations for all the methods mentioned above. MUSE turned out to have the highest search relevance, but the other methods produced more recommendations in a language other than the one of the target paper.

show abstract

A Call for More Rigor in Unsupervised Cross-lingual Learning

Cited by 33 publications

References 61 publications

Neural Unsupervised Domain Adaptation in NLP—A Survey

Neural Unsupervised Domain Adaptation in NLP—A Survey

Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models

Semantic Recommendation System for Bilingual Corpus of Academic Papers

Contact Info

Product

Resources

About