On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing

Ahmad, Wasi Uddin; Zhang, Zhisong; Ma, Xuezhe; Hovy, Eduard; Chang, Kai-Wei; Peng, Nanyun

doi:10.18653/v1/n19-1253

Cited by 99 publications

(90 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…At evaluation time, we follow the same approach as training time except for parsing. We threshold the sentence length to 140 words, including words and punctuation, following Ahmad et al (2019). In practice, the maximum subwords sequence length is the number of subwords of the first 140 words or 512, whichever is smaller.…”

Section: Methodsmentioning

confidence: 99%

“…Before the widespread use of cross-lingual word embeddings, task-specific models assumed coarse-grain representation like part-of-speech tags, in support of a delexicalized parser (Zeman and Resnik, 2008). More recently cross-lingual word embeddings have been used in conjunction with task-specific neural architectures for tasks like named entity recognition (Xie et al, 2018), part-of-speech tagging (Kim et al, 2017) and dependency parsing (Ahmad et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Wu¹,

Dredze²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

499

517

View full text Add to dashboard Cite

Pretrained contextual representation models (Peters et al., 2018;Devlin et al., 2019) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero-shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language-specific features, and measure factors that influence cross-lingual transfer.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Wu¹,

Dredze²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

499

517

View full text Add to dashboard Cite

show abstract

“…Both directions do not directly address structure. Ahmad et al (2019) showed structuralsensitivity is important for modern parsers; insensitive parsers suffer. Data transfer is an alternative solution to alleviate the typological divergences, such as annotation projection (Tiedemann, 2014) and source treebank reordering (Rasooli and Collins, 2019).…”

Section: Related Workmentioning

confidence: 99%

Working Hard or Hardly Working: Challenges of Integrating Typology into Neural Dependency Parsers

Fisch

Guo

Barzilay

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

This paper explores the task of leveraging typology in the context of cross-lingual dependency parsing. While this linguistic information has shown great promise in preneural parsing, results for neural architectures have been mixed. The aim of our investigation is to better understand this stateof-the-art. Our main findings are as follows: 1) The benefit of typological information is derived from coarsely grouping languages into syntactically-homogeneous clusters rather than from learning to leverage variations along individual typological dimensions in a compositional manner; 2) Typology consistent with the actual corpus statistics yields better transfer performance; 3) Typological similarity is only a rough proxy of crosslingual transferability with respect to parsing. 1 * The first two authors contributed equally.

show abstract

“…Many experiments (Ahmad et al, 2019) suggest that to achieve reasonable performance in the zeroshot setup, the source and the target languages need to share similar grammatical structure or lie in the same language family. In addition, since mBERT is not trained with explicit language signal, mBERT's multilingual representations are less effective for languages with little lexical overlap (Patra et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

Cross-lingual Alignment Methods for Multilingual BERT: A Comparative Study

Kulshreshtha¹,

García²,

Chang³

2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Multilingual BERT (mBERT) has shown reasonable capability for zero-shot cross-lingual transfer when fine-tuned on downstream tasks. Since mBERT is not pre-trained with explicit cross-lingual supervision, transfer performance can further be improved by aligning mBERT with cross-lingual signal. Prior work proposes several approaches to align contextualised embeddings. In this paper we analyse how different forms of cross-lingual supervision and various alignment methods influence the transfer capability of mBERT in zero-shot setting. Specifically, we compare parallel corpora vs. dictionary-based supervision and rotational vs. fine-tuning based alignment methods. We evaluate the performance of different alignment methodologies across eight languages on two tasks: Name Entity Recognition and Semantic Slot Filling. In addition, we propose a novel normalisation method which consistently improves the performance of rotation-based alignment including a notable 3% F1 improvement for distant and typologically dissimilar languages. Importantly we identify the biases of the alignment methods to the type of task and proximity to the transfer language. We also find that supervision from parallel corpus is generally superior to dictionary alignments.

show abstract

On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing

Cited by 99 publications

References 43 publications

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Working Hard or Hardly Working: Challenges of Integrating Typology into Neural Dependency Parsers

Cross-lingual Alignment Methods for Multilingual BERT: A Comparative Study

Contact Info

Product

Resources

About