Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Representations

Chen, Mingda; Chu, Zewei; Gimpel, Kevin

doi:10.18653/v1/d19-1060

Cited by 30 publications

(42 citation statements)

References 67 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Diagnostic probes were originally intended to explain information encoded in intermediate representations (Adi et al, 2017;Alain and Bengio, 2017;. Recently, various probing tasks have queried the representations of, e.g., contextualized word embeddings (Tenney et al, 2019a,b) and sentence embeddings (Linzen et al, 2016;Chen et al, 2019;Alt et al, 2020;Kassner and Schütze, 2020;Chi et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

“…The performance on tasks they are trained to predict are used to evaluate the richness of the linguistic representation in encoding the probed tasks. Such tasks include probing syntax (Hewitt and Manning, 2019;Lin et al, 2019;Tenney et al, 2019a), semantics (Yaghoobzadeh et al, 2019), discourse features (Chen et al, 2019;Liu et al, 2019;Tenney et al, 2019b), and commonsense knowledge (Petroni et al, 2019;Poerner et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An information theoretic view on selecting linguistic probes

Zhu

Rudzicz

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

There is increasing interest in assessing the linguistic knowledge encoded in neural representations. A popular approach is to attach a diagnostic classifier -or "probe" -to perform supervised classification from internal representations. However, how to select a good probe is in debate. Hewitt and Liang (2019) showed that a high performance on diagnostic classification itself is insufficient, because it can be attributed to either "the representation being rich in knowledge", or "the probe learning the task", which Pimentel et al. ( 2020) challenged. We show this dichotomy is valid informationtheoretically. In addition, we find that the methods to construct and select good probes proposed by the two papers, control task (Hewitt and Liang, 2019) and control function , are equivalent -the errors of their approaches are identical (modulo irrelevant terms). Empirically, these two selection criteria lead to results that highly agree with each other.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

An information theoretic view on selecting linguistic probes

Zhu

Rudzicz

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…The first approach aims to improve performance at test time by designing useful signals for pretraining, for instance using hyperlinks (Logeswaran et al, 2019;Chen et al, 2019a) or document structure in Wikipedia (Chen et al, 2019b), knowledge bases (Logan et al, 2019), and discourse markers (Nie et al, 2019). Here, we focus on using category hierarchies in Wikipedia.…”

Section: Related Workmentioning

confidence: 99%

Mining Knowledge for Natural Language Inference from Wikipedia Categories

Chen

Chu

Stratos

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

Self Cite

View full text Add to dashboard Cite

Accurate lexical entailment (LE) and natural language inference (NLI) often require large quantities of costly annotations. To alleviate the need for labeled data, we introduce WIKINLI: a resource for improving model performance on NLI and LE tasks. It contains 428,899 pairs of phrases constructed from naturally annotated category hierarchies in Wikipedia. We show that we can improve strong baselines such as BERT (Devlin et al., 2019) and RoBERTa (Liu et al., 2019) by pretraining them on WIKINLI and transferring the models on downstream tasks. We conduct systematic comparisons with phrases extracted from other knowledge bases such as WordNet and Wikidata to find that pretraining on WIKINLI gives the best performance. In addition, we construct WIKINLI in other languages, and show that pretraining on them improves performance on NLI tasks of corresponding languages. 1 * Equal contribution. Listed in alphabetical order. 1 Code and data are available at https://github. com/ZeweiChu/WikiNLI.

show abstract

“…Probing involves training lightweight classifiers over features produced by a pretrained model, and assessing the model's knowledge by the probe's performance. Probing has been used for low-level properties such as word order and sentence length (Adi et al, 2017;Conneau et al, 2018), as well as phenomena at the level of syntax (Hewitt and Manning, 2019), semantics (Tenney et al, 2019b;Liu et al, 2019b;, and discourse structure (Chen et al, 2019). Error analysis on probes has been used to argue that BERT may sim-ulate sequential decision making across layers (Tenney et al, 2019a), or that it encodes its own, soft notion of syntactic distance (Reif et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

Asking without Telling: Exploring Latent Ontologies in Contextual Representations

Michael

Botha

Tenney

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

The success of pretrained contextual encoders, such as ELMo and BERT, has brought a great deal of interest in what these models learn: do they, without explicit supervision, learn to encode meaningful notions of linguistic structure? If so, how is this structure encoded? To investigate this, we introduce latent subclass learning (LSL): a modification to classifierbased probing that induces a latent categorization (or ontology) of the probe's inputs. Without access to fine-grained gold labels, LSL extracts emergent structure from input representations in an interpretable and quantifiable form. In experiments, we find strong evidence of familiar categories, such as a notion of personhood in ELMo, as well as novel ontological distinctions, such as a preference for fine-grained semantic roles on core arguments. Our results provide unique new evidence of emergent structure in pretrained encoders, including departures from existing annotations which are inaccessible to earlier methods.

show abstract

Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Representations

Cited by 30 publications

References 67 publications

An information theoretic view on selecting linguistic probes

An information theoretic view on selecting linguistic probes

Mining Knowledge for Natural Language Inference from Wikipedia Categories

Asking without Telling: Exploring Latent Ontologies in Contextual Representations

Contact Info

Product

Resources

About