How Multilingual is Multilingual BERT?

Pires, Telmo; Schlinger, Eva; Garrette, Dan

doi:10.18653/v1/p19-1493

Cited by 906 publications

(871 citation statements)

References 13 publications

(10 reference statements)

Supporting

Mentioning

826

Contrasting

Unclassified

Order By: Relevance

“…Similar to multilingual BERT, Mulcaire et al (2019) trains a single ELMo on distantly related languages and shows mixed results as to the benefit of pretaining. Parallel to our work, Pires et al (2019) shows mBERT has good zero-shot cross-lingual transfer performance on NER and POS tagging. They show how subword overlap and word ordering effect mBERT transfer performance.…”

Section: Introductionsupporting

confidence: 72%

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Wu¹,

Dredze²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

499

517

View full text Add to dashboard Cite

Pretrained contextual representation models (Peters et al., 2018;Devlin et al., 2019) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero-shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language-specific features, and measure factors that influence cross-lingual transfer.

show abstract

Section: Introductionsupporting

confidence: 72%

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Wu¹,

Dredze²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

499

517

View full text Add to dashboard Cite

show abstract

“…While Pires et al (2019) hypothesize word order is the main culprit for the poor zero-shot performance for Japanese when transferring a POStagger from English, our experiments with Korean and Japanese show a different picture.…”

Section: Language Outliersmentioning

confidence: 62%

Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations

Tran¹,

Bisazza²

2019

Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

View full text Add to dashboard Cite

We investigate whether off-the-shelf deep bidirectional sentence representations (Devlin et al., 2018) trained on a massively multilingual corpus (multilingual BERT) enable the development of an unsupervised universal dependency parser. This approach only leverages a mix of monolingual corpora in many languages and does not require any translation data making it applicable to low-resource languages. In our experiments we outperform the best CoNLL 2018 language-specific systems in all of the shared task's six truly low-resource languages while using a single system. However, we also find that (i) parsing accuracy still varies dramatically when changing the training languages and (ii) in some target languages zero-shot transfer fails under all tested conditions, raising concerns on the 'universality' of the whole approach.

show abstract

“…Another interesting observation on transformer-based LMs is that multilingual models which were pre-trained from multiple monolingual corpora were able to generalize information across different languages [30]. Wu and Dredze [31] confirmed that a multilingual BERT model performed well uniformly across languages in document classification, named entity recognition, and part-of-speech tagging, when fine-tuned with a small amount of target language supervision for the downstream task.…”

Section: Related Workmentioning

confidence: 95%

Assessment of Word-Level Neural Language Models for Sentence Completion

Park

2020

Applied Sciences

View full text Add to dashboard Cite

The task of sentence completion, which aims to infer the missing text of a given sentence, was carried out to assess the reading comprehension level of machines as well as humans. In this work, we conducted a comprehensive study of various approaches for the sentence completion based on neural language models, which have been advanced in recent years. First, we revisited the recurrent neural network language model (RNN LM), achieving highly competitive results with an appropriate network structure and hyper-parameters. This paper presents a bidirectional version of RNN LM, which surpassed the previous best results on Microsoft Research (MSR) Sentence Completion Challenge and the Scholastic Aptitude Test (SAT) sentence completion questions. In parallel with directly applying RNN LM to sentence completion, we also employed a supervised learning framework that fine-tunes a large pre-trained transformer-based LM with a few sentence-completion examples. By fine-tuning a pre-trained BERT model, this work established state-of-the-art results on the MSR and SAT sets. Furthermore, we performed similar experimentation on newly collected cloze-style questions in the Korean language. The experimental results reveal that simply applying the multilingual BERT models for the Korean dataset was not satisfactory, which leaves room for further research.results with classical non-neural feature based methods. In [9], the authors introduced a neural model named context2vec, which embeds a target word by considering the surrounding sentential context, demonstrating its usefulness in sentence completion in addition to word sense disambiguation and lexical substitution. Tran et al. [10] established the state-of-the-art results on the MSR set with Recurrent Memory Network (RMN), which stacked memory network blocks on RNN for language modeling.Recently, Park et al. [11] revisited the word-level RNN LM based approach for sentence completion. Motivated by the empirical fact that the performance of the RNN LM highly depends on the number of nodes and optimization parameters [12,13], Park et al. demonstrated that their implementation of RNN LM surpassed the state-of-the-art models on the MSR set despite its simple architecture. Furthermore, they proposed a bidirectional version, which delivered additional performance gains by exploiting future context information. The authors also validated the RNN LMs against the SAT dataset, and they achieved higher accuracy than the other previously published results.This work extends the study of Park et al. [11] with extensive experiments on various sentence completion methods based on neural LMs. To clarify which modification of the RNN LM mainly brings the performance gain, we added more experimental results for different choices of the network. Furthermore, this paper introduces and compares three criteria for selecting the answer based on a trained LM for sentence completion.This study also includes a supervised learning approach that directly receives supervision from sentence completion questions....

show abstract

How Multilingual is Multilingual BERT?

Cited by 906 publications

References 13 publications

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations

Assessment of Word-Level Neural Language Models for Sentence Completion

Contact Info

Product

Resources

About