Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.105
|View full text |Cite
|
Sign up to set email alerts
|

Probing Multilingual BERT for Genetic and Typological Signals

Abstract: We probe the layers in multilingual BERT (mBERT) for phylogenetic and geographic language signals across 100 languages and compute language distances based on the mBERT representations. We 1) employ the language distances to infer and evaluate language trees, finding that they are close to the reference family tree in terms of quartet tree distance, 2) perform distance matrix regression analysis, finding that the language distances can be best explained by phylogenetic and worst by structural factors and 3) pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 49 publications
0
6
0
Order By: Relevance
“…Though many languages share such features as a result of typological relations (which mBERT is known to exploit; see, e.g. Pires et al, 2019;Choenni and Shutova, 2020;Rama et al, 2020), there are also language-specific features to which, we hypothesise, mBERT needs to dedicate a greater share of its representational capacity, compared to the NLI task.…”
Section: Introductionmentioning
confidence: 89%
“…Though many languages share such features as a result of typological relations (which mBERT is known to exploit; see, e.g. Pires et al, 2019;Choenni and Shutova, 2020;Rama et al, 2020), there are also language-specific features to which, we hypothesise, mBERT needs to dedicate a greater share of its representational capacity, compared to the NLI task.…”
Section: Introductionmentioning
confidence: 89%
“…Though many languages share such features as a result of typological relations (which mBERT is known to exploit; see, e.g. Pires et al, 2019;Choenni and Shutova, 2020;Rama et al, 2020), there are also language-specific features to which, we hypothesise, mBERT needs to dedicate a greater share of its representational capacity, compared to the NLI task.…”
Section: Introductionmentioning
confidence: 80%
“…Yu et al (2021) train language embeddings from denoising autoencoders for 29 languages 1 , which is still a small number. Rama et al (2020) analyze language distance based on representations from mBERT and multilingual FastText embeddings (Bojanowski et al, 2017). They do so specifically by taking the averaged pairwise distances between vectors of words from a multilingual word list.…”
Section: Representational Similaritymentioning
confidence: 99%