2018
DOI: 10.1162/tacl_a_00240
|View full text |Cite
|
Sign up to set email alerts
|

Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns

Abstract: Coreference resolution is an important task for natural language understanding, and the resolution of ambiguous pronouns a longstanding challenge. Nonetheless, existing corpora do not capture ambiguous pronouns in sufficient volume or diversity to accurately indicate the practical utility of models. Furthermore, we find gender bias in existing corpora and systems favoring masculine entities. To address this, we present and release GAP, a gender-balanced labeled corpus of 8,908 ambiguous pronoun-name pairs samp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
259
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 210 publications
(277 citation statements)
references
References 31 publications
2
259
1
Order By: Relevance
“…We evaluate our BERT-based models on two benchmarks: the paragraph-level GAP dataset (Webster et al, 2018), and the documentlevel English OntoNotes 5.0 dataset (Pradhan et al, 2012). OntoNotes examples are considerably longer and typically require multiple segments to read the entire document.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We evaluate our BERT-based models on two benchmarks: the paragraph-level GAP dataset (Webster et al, 2018), and the documentlevel English OntoNotes 5.0 dataset (Pradhan et al, 2012). OntoNotes examples are considerably longer and typically require multiple segments to read the entire document.…”
Section: Methodsmentioning
confidence: 99%
“…We fine-tune BERT to coreference resolution, achieving strong improvements on the GAP (Webster et al, 2018) and OntoNotes (Pradhan et al, 2012) benchmarks. We present two ways of extending the c2f-coref model in .…”
Section: Introductionmentioning
confidence: 99%
“…Another GBET for coreference resolution named GAP contains sentences mined from Wikipedia and thus can perform an evaluation with sentences taken from real contexts as opposed to artificially generated ones (Webster et al, 2018). GAP does not include stereotypical nouns; instead, pronouns refer to names only.…”
Section: Taskmentioning
confidence: 99%
“…The GAP Coreference Dataset 3 (Webster et al, 2018) has 4454 records and officially split into three parts: development set (2000 records), test set (2000 records), and validation set (454 records). Conforming to the stage 1 of Gendered Pronoun Resolution 4 task, the official test set and validation set are combined as the training dataset in the experiments, while the official development set is used as the test set correspondingly.…”
Section: Datasetmentioning
confidence: 99%