What you can cram into a single $&amp;!#* vector: Probing sentence embeddings for linguistic properties

Conneau, Alexis; Kruszewski, Germán; Lample, Guillaume; Barrault, Loïc; Baroni, Marco

doi:10.18653/v1/p18-1198

Cited by 598 publications

(683 citation statements)

References 26 publications

Supporting

Mentioning

611

Contrasting

Unclassified

Order By: Relevance

“…Through probing methods, it has been shown that a broad range of supervised learning tasks can be turned into tools for understanding the properties of contextual word representations (Conneau et al, 2018;. Alain and Bengio (2016) suggested we may think of probes as "thermometers used to measure the temperature simultaneously at many different locations".…”

Section: Resultsmentioning

confidence: 99%

Designing and Interpreting Probes with Control Tasks

Hewitt

Liang

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

359

436

View full text Add to dashboard Cite

Probes, supervised models trained to predict properties (like parts-of-speech) from representations (like ELMo), have achieved high accuracy on a range of linguistic tasks. But does this mean that the representations encode linguistic structure or just that the probe has learned the linguistic task? In this paper, we propose control tasks, which associate word types with random outputs, to complement linguistic tasks. By construction, these tasks can only be learned by the probe itself. So a good probe, (one that reflects the representation), should be selective, achieving high linguistic task accuracy and low control task accuracy. The selectivity of a probe puts linguistic task accuracy in context with the probe's capacity to memorize from word types. We construct control tasks for English part-of-speech tagging and dependency edge prediction, and show that popular probes on ELMo representations are not selective. We also find that dropout, commonly used to control probe complexity, is ineffective for improving selectivity of MLPs, but that other forms of regularization are effective. Finally, we find that while probes on the first layer of ELMo yield slightly better part-of-speech tagging accuracy than the second, probes on the second layer are substantially more selective, which raises the question of which layer better represents parts-of-speech.

show abstract

Section: Resultsmentioning

confidence: 99%

Designing and Interpreting Probes with Control Tasks

Hewitt

Liang

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

359

436

View full text Add to dashboard Cite

show abstract

“…Another branch of work uses probing tasks in which the objective is to predict the value of a particular linguistic feature given an input sentence. Probing tasks have been used to investigate whether sentence embeddings encode syntactic and surface features such as tense and voice (Shi et al, 2016), sentence length and word content (Adi et al, 2016), or syntactic depth and morphological number (Conneau et al, 2018). Giulianelli et al (2018) use diagnostic classifiers to track the propagation of information in RNNbased language models.…”

Section: Introductionmentioning

confidence: 99%

Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs

Warstadt¹,

Cao²,

Grosu³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

111

View full text Add to dashboard Cite

Though state-of-the-art sentence representation models can perform tasks requiring significant knowledge of grammar, it is an open question how best to evaluate their grammatical knowledge. We explore five experimental methods inspired by prior work evaluating pretrained sentence representation models. We use a single linguistic phenomenon, negative polarity item (NPI) licensing in English, as a case study for our experiments. NPIs like any are grammatical only if they appear in a licensing environment like negation (Sue doesn't have any cats vs. *Sue has any cats). This phenomenon is challenging because of the variety of NPI licensing environments that exist. We introduce an artificially generated dataset that manipulates key features of NPI licensing for the experiments. We find that BERT has significant knowledge of these features, but its success varies widely across different experimental methods. We conclude that a variety of methods is necessary to reveal all relevant aspects of a model's grammatical knowledge in a given domain. 1 Other prominent theories of NPI licensing are based on notions of non-veridicality (

show abstract

“…More recently, it was shown that both i-vectors and x-vectors contain information about the speaking style and emotion [12]. In natural language processing (NLP), probing tasks for embeddings have gained attention [13] due to sentence encoders such as BERT [14], which are pretrained on language modeling, but achieve state-of-the-art performance across several other tasks.…”

Section: Introductionmentioning

confidence: 99%

Probing the Information Encoded in X-Vectors

Raj

Snyder

Povey

et al. 2019

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

View full text Add to dashboard Cite

Deep neural network based speaker embeddings, such as xvectors, have been shown to perform well in text-independent speaker recognition/verification tasks. In this paper, we use simple classifiers to investigate the contents encoded by xvector embeddings. We probe these embeddings for information related to the speaker, channel, transcription (sentence, words, phones), and meta information about the utterance (duration and augmentation type), and compare these with the information encoded by i-vectors across a varying number of dimensions. We also study the effect of data augmentation during extractor training on the information captured by x-vectors. Experiments on the RedDots data set show that x-vectors capture spoken content and channel-related information, while performing well on speaker verification tasks.

show abstract

What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties

Cited by 598 publications

References 26 publications

Designing and Interpreting Probes with Control Tasks

Designing and Interpreting Probes with Control Tasks

Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs

Probing the Information Encoded in X-Vectors

Contact Info

Product

Resources

About