Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1060
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Representations

Abstract: Prior work on pretrained sentence embeddings and benchmarks focuses on the capabilities of representations for stand-alone sentences. We propose DiscoEval, a test suite of tasks to evaluate whether sentence representations include information about the role of a sentence in its discourse context. We also propose a variety of training objectives that make use of natural annotations from Wikipedia to build sentence encoders capable of modeling discourse information. We benchmark sentence encoders trained with ou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
41
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 30 publications
(42 citation statements)
references
References 67 publications
1
41
0
Order By: Relevance
“…Diagnostic probes were originally intended to explain information encoded in intermediate representations (Adi et al, 2017;Alain and Bengio, 2017;. Recently, various probing tasks have queried the representations of, e.g., contextualized word embeddings (Tenney et al, 2019a,b) and sentence embeddings (Linzen et al, 2016;Chen et al, 2019;Alt et al, 2020;Kassner and Schütze, 2020;Chi et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Diagnostic probes were originally intended to explain information encoded in intermediate representations (Adi et al, 2017;Alain and Bengio, 2017;. Recently, various probing tasks have queried the representations of, e.g., contextualized word embeddings (Tenney et al, 2019a,b) and sentence embeddings (Linzen et al, 2016;Chen et al, 2019;Alt et al, 2020;Kassner and Schütze, 2020;Chi et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…The performance on tasks they are trained to predict are used to evaluate the richness of the linguistic representation in encoding the probed tasks. Such tasks include probing syntax (Hewitt and Manning, 2019;Lin et al, 2019;Tenney et al, 2019a), semantics (Yaghoobzadeh et al, 2019), discourse features (Chen et al, 2019;Liu et al, 2019;Tenney et al, 2019b), and commonsense knowledge (Petroni et al, 2019;Poerner et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…The first approach aims to improve performance at test time by designing useful signals for pretraining, for instance using hyperlinks (Logeswaran et al, 2019;Chen et al, 2019a) or document structure in Wikipedia (Chen et al, 2019b), knowledge bases (Logan et al, 2019), and discourse markers (Nie et al, 2019). Here, we focus on using category hierarchies in Wikipedia.…”
Section: Related Workmentioning
confidence: 99%
“…Probing involves training lightweight classifiers over features produced by a pretrained model, and assessing the model's knowledge by the probe's performance. Probing has been used for low-level properties such as word order and sentence length (Adi et al, 2017;Conneau et al, 2018), as well as phenomena at the level of syntax (Hewitt and Manning, 2019), semantics (Tenney et al, 2019b;Liu et al, 2019b;, and discourse structure (Chen et al, 2019). Error analysis on probes has been used to argue that BERT may sim-ulate sequential decision making across layers (Tenney et al, 2019a), or that it encodes its own, soft notion of syntactic distance (Reif et al, 2019).…”
Section: Introductionmentioning
confidence: 99%