Asking without Telling: Exploring Latent Ontologies in Contextual Representations

Michael, Julian; Botha, Jan A.; Tenney, Ian

doi:10.18653/v1/2020.emnlp-main.552

Cited by 31 publications

(29 citation statements)

References 56 publications

(67 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since syntactic subspaces are at most a small part of the total BERT space, these are not necessarily mutually contradictory with our results. In concurrent work, Michael et al (2020) also extend probing methodology, extracting latent ontologies from contextual representations without direct supervision.…”

Section: Understanding Representationsmentioning

confidence: 99%

Finding Universal Grammatical Relations in Multilingual BERT

Ethan¹,

Hewitt²,

Manning³

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Recent work has found evidence that Multilingual BERT (mBERT), a transformer-based multilingual masked language model, is capable of zero-shot cross-lingual transfer, suggesting that some aspects of its representations are shared cross-lingually. To better understand this overlap, we extend recent work on finding syntactic trees in neural networks' internal representations to the multilingual setting. We show that subspaces of mBERT representations recover syntactic tree distances in languages other than English, and that these subspaces are approximately shared across languages. Motivated by these results, we present an unsupervised analysis method that provides evidence mBERT learns representations of syntactic dependency labels, in the form of clusters which largely agree with the Universal Dependencies taxonomy. This evidence suggests that even without explicit supervision, multilingual masked language models learn certain linguistic universals.

show abstract

Section: Understanding Representationsmentioning

confidence: 99%

Finding Universal Grammatical Relations in Multilingual BERT

Ethan¹,

Hewitt²,

Manning³

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Hence, it might not be optimal for contextual embeddings, especially in the light that the latter tends to have a clustered structure. For instance, recent work suggests that word types (e.g., verbs, nouns, punctuations), entities (e.g., personhood, nationalities, and dates), and even word senses (Michael et al, 2020;Loureiro et al, 2021;Reif et al, 2019) create local distinct clustered areas in the contextual embedding space. Moreover, our local assessment shows that it is not necessarily the case that all clusters share the same dominant directions.…”

Section: Cluster-based Isotropy Enhancementmentioning

confidence: 99%

A Cluster-based Approach for Improving Isotropy in Contextual Embedding Space

Rajaee¹,

Pilehvar²

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

The representation degeneration problem in Contextual Word Representations (CWRs) hurts the expressiveness of the embedding space by forming an anisotropic cone where even unrelated words have excessively positive correlations. Existing techniques for tackling this issue require a learning process to re-train models with additional objectives and mostly employ a global assessment to study isotropy. Our quantitative analysis over isotropy shows that a local assessment could be more accurate due to the clustered structure of CWRs. Based on this observation, we propose a local cluster-based method to address the degeneration issue in contextual embedding spaces. We show that in clusters including punctuations and stop words, local dominant directions encode structural information, removing which can improve CWRs performance on semantic tasks. Moreover, we find that tense information in verb representations dominates sense semantics. We show that removing dominant directions of verb representations can transform the space to better suit semantic applications. Our experiments demonstrate that the proposed cluster-based method can mitigate the degeneration problem on multiple tasks. 1

show abstract

“…Meanwhile, methods were proposed that take into account not only the probing performance but also the ease of extracting linguistic information (Voita & Titov, 2020) or the complexity of the probing model (Pimentel et al, 2020a). At the same time, Wu et al (2020) and Michael et al (2020) suggested avoiding learnability issues by non-parametric probing 26 and weak supervision respectively. The remainder of the criticism is directed at the limitations of probing such as insufficient reliability for low-resourced languages (Eger et al, 2020), lack of evidence that probes indeed extract linguistic structures but do not learn from the linear context only (Kunz & Kuhlmann, 2020), lack of correlation with fine-tuning scores (Tamkin et al, 2020) and with pretraining scores (Ravichander et al, 2020;Elazar et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

The Rediscovery Hypothesis: Language Models Need to Meet Linguistics

Nikoulina¹,

Tezekbayev

Kozhakhmet

et al. 2021

jair

View full text Add to dashboard Cite

There is an ongoing debate in the NLP community whether modern language models contain linguistic knowledge, recovered through so-called probes. In this paper, we study whether linguistic knowledge is a necessary condition for the good performance of modern language models, which we call the rediscovery hypothesis. In the first place, we show that language models that are significantly compressed but perform well on their pretraining objectives retain good scores when probed for linguistic structures. This result supports the rediscovery hypothesis and leads to the second contribution of our paper: an information-theoretic framework that relates language modeling objectives with linguistic information. This framework also provides a metric to measure the impact of linguistic information on the word prediction task. We reinforce our analytical results with various experiments, both on synthetic and on real NLP tasks in English.

show abstract

Asking without Telling: Exploring Latent Ontologies in Contextual Representations

Cited by 31 publications

References 56 publications

Finding Universal Grammatical Relations in Multilingual BERT

Finding Universal Grammatical Relations in Multilingual BERT

A Cluster-based Approach for Improving Isotropy in Contextual Embedding Space

The Rediscovery Hypothesis: Language Models Need to Meet Linguistics

Contact Info

Product

Resources

About