Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1074
|View full text |Cite
|
Sign up to set email alerts
|

DocRED: A Large-Scale Document-Level Relation Extraction Dataset

Abstract: Multiple entities in a document generally exhibit complex inter-sentence relations, and cannot be well handled by existing relation extraction (RE) methods that typically focus on extracting intra-sentence relations for single entity pairs. In order to accelerate the research on document-level RE, we introduce DocRED, a new dataset constructed from Wikipedia and Wikidata with three features: (1) DocRED annotates both named entities and relations, and is the largest humanannotated dataset for document-level RE … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
385
1
1

Year Published

2020
2020
2021
2021

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 294 publications
(390 citation statements)
references
References 32 publications
(30 reference statements)
3
385
1
1
Order By: Relevance
“…For document-level RE, the input is a document with annotated entities, as well as multiple occurrences of each entity, i.e., entity mentions, the goal is to identify all the related entity pairs in the document. Following [15], we transform RE into a classification problem. We use upper case letters to represent entities (E 1 , · · · , E m ) and lower case letters to represent mentions (e 1 , · · · , e m ).…”
Section: Task Descriptionmentioning
confidence: 99%
See 4 more Smart Citations
“…For document-level RE, the input is a document with annotated entities, as well as multiple occurrences of each entity, i.e., entity mentions, the goal is to identify all the related entity pairs in the document. Following [15], we transform RE into a classification problem. We use upper case letters to represent entities (E 1 , · · · , E m ) and lower case letters to represent mentions (e 1 , · · · , e m ).…”
Section: Task Descriptionmentioning
confidence: 99%
“…To evaluate the effectiveness of our model, we use the DocRED dataset [15], which is the largest human-annotated document-level RE dataset constructed from Wikidata and Wikipedia. DocRED contains over 5,053 documents, 40,276 sentences, 132,375 entities and 96 frequent relation types.…”
Section: Datasetmentioning
confidence: 99%
See 3 more Smart Citations