Don’t Use English Dev: On the Zero-Shot Cross-Lingual Evaluation of Contextual Embeddings

Keung, Phillip; Lu, Yichao; Salazar, Julián; Bhardwaj, Vikas

doi:10.18653/v1/2020.emnlp-main.40

Cited by 20 publications

(25 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With exactly the same training data, using different random seeds yields a 1-2 accuracy difference of FS-XLT (Figure 1 top). A similar phenomenon has been observed in finetuning monolingual encoders (Dodge et al, 2020) and multilingual encoders with ZS-XLT (Keung et al, 2020a;Wu and Dredze, 2020b;Xia et al, 2020); we show this observation also holds for FS-XLT. The key takeaway is that varying the buckets is a more severe problem.…”

Section: Target-adapting Resultssupporting

confidence: 90%

“…For MLDoc, our results are comparable to (Dong and de Melo, 2019;Wu and Dredze, 2019;Eisenschlos et al, 2019). It is worth noting that reproducing the exact results is challenging, as suggested by Keung et al (2020a). For MARC, our zero-shot results are worse than Keung et al (2020b)'s who use the dev set of each target language for model selection while we use EN dev, following the common true ZS-XLT setup.…”

Section: Source-training Resultsmentioning

confidence: 64%

“…A widely explored transfer scenario is zero-shot crosslingual transfer (Pires et al, 2019;Conneau and Lample, 2019;Artetxe and Schwenk, 2019), where a pretrained encoder is finetuned on abundant task data in the source language (e.g., English) and then directly evaluated on target-language test data, achieving surprisingly good performance (Wu and Dredze, 2019;Hu et al, 2020). However, there is evidence that zero-shot performance reported in the literature has large variance and is often not reproducible (Keung et al, 2020a;Rios et al, 2020); the results in languages distant from English fall far short of those similar to English (Hu et al, 2020;Liang et al, 2020). Lauscher et al (2020) stress the importance of few-shot crosslingual transfer instead, where the encoder is first finetuned on a source language and then further finetuned with a small amount (10-100) of examples (few shots) of the target language.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

Zhao¹,

Zhu²,

Shareghi

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Few-shot crosslingual transfer has been shown to outperform its zero-shot counterpart with pretrained encoders like multilingual BERT. Despite its growing popularity, little to no attention has been paid to standardizing and analyzing the design of few-shot experiments. In this work, we highlight a fundamental risk posed by this shortcoming, illustrating that the model exhibits a high degree of sensitivity to the selection of few shots. We conduct a largescale experimental study on 40 sets of sampled few shots for six diverse NLP tasks across up to 40 languages. We provide an analysis of success and failure cases of few-shot transfer, which highlights the role of lexical features. Additionally, we show that a straightforward full model finetuning approach is quite effective for few-shot transfer, outperforming several state-of-the-art few-shot approaches. As a step towards standardizing few-shot crosslingual experimental designs, we make our sampled few shots publicly available. 1 * Equal contribution.

show abstract

Section: Target-adapting Resultssupporting

confidence: 90%

Section: Source-training Resultsmentioning

confidence: 64%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

Zhao¹,

Zhu²,

Shareghi

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…To choose the final model, we use the scores on the English development data. We are aware that this was recently shown to be sub-optimal in some settings (Keung et al, 2020), however there is no clear solution on how to circumvent this in a pure zero-shot cross-lingual setup (i.e. without assuming any target language target task annotation data).…”

Section: Methodsmentioning

confidence: 99%

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding

Goot¹,

Sharaf²,

Imankulova³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

The lack of publicly available evaluation data for low-resource languages limits progress in Spoken Language Understanding (SLU). As key tasks like intent classification and slot filling require abundant training data, it is desirable to reuse existing data in high-resource languages to develop models for low-resource scenarios. We introduce XSID, a new benchmark for cross-lingual (X) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect. To tackle the challenge, we propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer. We study two setups which differ by type and language coverage of the pre-trained embeddings. Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification. 1

show abstract

“…Following Keung et al (2020), in all experiments the other hyper-parameters are tuned on each target language dev set. We train all models for 10 epochs and choose the best model checkpoint with the target dev set.…”

Section: Implementation Detailsmentioning

confidence: 99%

AdvPicker: Effectively Leveraging Unlabeled Data via Adversarial Discriminator for Cross-Lingual NER

Chen¹,

Jiang²,

Wu³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Neural methods have been shown to achieve high performance in Named Entity Recognition (NER), but rely on costly high-quality labeled data for training, which is not always available across languages. While previous works have shown that unlabeled data in a target language can be used to improve crosslingual model performance, we propose a novel adversarial approach (AdvPicker) to better leverage such data and further improve results. We design an adversarial learning framework in which an encoder learns entity domain knowledge from labeled source-language data and better shared features are captured via adversarial training -where a discriminator selects less language-dependent target-language data via similarity to the source language. Experimental results on standard benchmark datasets well demonstrate that the proposed method benefits strongly from this data selection process and outperforms existing state-ofthe-art methods; without requiring any additional external resources (e.g., gazetteers or via machine translation). 1

show abstract

Don’t Use English Dev: On the Zero-Shot Cross-Lingual Evaluation of Contextual Embeddings

Cited by 20 publications

References 17 publications

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding

AdvPicker: Effectively Leveraging Unlabeled Data via Adversarial Discriminator for Cross-Lingual NER

Contact Info

Product

Resources

About