Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019) 2019
DOI: 10.18653/v1/s19-1026
|View full text |Cite
|
Sign up to set email alerts
|

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

Abstract: We introduce a set of nine challenge tasks that test for the understanding of function words. These tasks are created by structurally mutating sentences from existing datasets to target the comprehension of specific types of function words (e.g., prepositions, wh-words). Using these probing tasks, we explore the effects of various pretraining objectives for sentence encoders (e.g., language modeling, CCG supertagging and natural language inference (NLI)) on the learned representations. Our results show that pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
65
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 80 publications
(67 citation statements)
references
References 34 publications
2
65
0
Order By: Relevance
“…We use well-established datasets for our probing tasks, including the edge-probing suite from Tenney et al (2019b), function word oriented tasks from Kim et al (2019), and sentence-level probing datasets (SentEval; Conneau et al, 2018).…”
Section: Probing Tasksmentioning
confidence: 99%
See 1 more Smart Citation
“…We use well-established datasets for our probing tasks, including the edge-probing suite from Tenney et al (2019b), function word oriented tasks from Kim et al (2019), and sentence-level probing datasets (SentEval; Conneau et al, 2018).…”
Section: Probing Tasksmentioning
confidence: 99%
“…We use the following five datasets: AJ-CoLA is a task that tests for a model's understanding of general grammaticality using the Corpus of Linguistic Acceptability (CoLA) (Warstadt et al, 2019b), which is drawn from 22 theoretical linguistics publications. The other tasks concern the behaviors of specific classes of function words, using the dataset by Kim et al (2019): AJ-WH is a task that tests a model's ability to detect if a wh-word in a sentence has been swapped with another wh-word, which tests a model's ability to identify the antecedent associated with the wh-word. AJ-Def is a task that tests a model's ability to detect if the definite/indefinite articles in a given sentence have been swapped.…”
Section: Probing Tasksmentioning
confidence: 99%
“…If, on the other hand, interpretability is defined as the possibility to provide a post-hoc, compact natural language explanation of why a certain output was produced in response to a certain input, then humans and complex artificial models can, in principle, be equally interpretable. Saliency maps (Simonyan, Vedaldi, & Zisserman, 2013), behavioral testing (Ribeiro, Wu, Guestrin, & Singh, 2020), probing methods (Bolukbasi, Chang, Zou, Saligrama, & Kalai, 2016;Bordia & Bowman, 2019;Gardner et al, 2020;Kim et al, 2019;Linzen & Baroni, 2020), and adversarial attacks (I. I.…”
Section: Interpretabilitymentioning
confidence: 99%
“…A portion of past work on analyzing pre-trained encoders is mainly based on clean data. As mentioned in Tenney et al (2019a), these studies can be roughly divided into two categories: (1) designing controlled tasks to probe whether a specific linguistic phenomenon is captured by models (Conneau et al, 2018;Peters et al, 2019;Tenney et al, 2019b;Kim et al, 2019), or (2) decomposing the model structure and exploring what linguistic property is encoded (Tenney et al, 2019a;Jawahar et al, 2019;Clark et al, 2019). However, these studies do not analyze how grammatical errors affect model behaviors.…”
Section: Related Workmentioning
confidence: 99%