Proceedings of the 13th Linguistic Annotation Workshop 2019
DOI: 10.18653/v1/w19-4015
|View full text |Cite
|
Sign up to set email alerts
|

DEFT: A corpus for definition extraction in free- and semi-structured text

Abstract: Definition extraction has been a popular topic in NLP research for well more than a decade, but has been historically limited to welldefined, structured, and narrow conditions. In reality, natural language is messy, and messy data requires both complex solutions and data that reflects that reality. In this paper, we present a robust English corpus and annotation schema that allows us to explore the less straightforward examples of term-definition structures in free and semi-structured text.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
2
2
2

Relationship

1
5

Authors

Journals

citations
Cited by 34 publications
(43 citation statements)
references
References 13 publications
0
39
0
Order By: Relevance
“…We run experiments on 8 different benchmark datasets, semeval10 ), 2 tacred (Zhang et al, 2017), kbp37 (Zhang and Wang, 2015), wiki80 , deft2020 (Spala et al, 2019), i2b2 (zlem et al, 2011), ddi (Herrero-Zazo et al, 2013, chemprot (Krallinger et al, 2017). These tasks are from various domains and are different in the respects of dataset sizes, sentence length, entity mention length, etc, to demonstrate that our method is robust for various RC tasks.…”
Section: Datasetsmentioning
confidence: 99%
“…We run experiments on 8 different benchmark datasets, semeval10 ), 2 tacred (Zhang et al, 2017), kbp37 (Zhang and Wang, 2015), wiki80 , deft2020 (Spala et al, 2019), i2b2 (zlem et al, 2011), ddi (Herrero-Zazo et al, 2013, chemprot (Krallinger et al, 2017). These tasks are from various domains and are different in the respects of dataset sizes, sentence length, entity mention length, etc, to demonstrate that our method is robust for various RC tasks.…”
Section: Datasetsmentioning
confidence: 99%
“…We divided non-definitional text into two types: plausible (24.8%) and implausible (11.8%), which signals an error. The plausible text refers to explanations or secondary information (similar to DEFT (Spala et al, 2019)'s secondary definition, but without sentence crossings).…”
Section: Term (%) Definition (%)mentioning
confidence: 99%
“…Therefore, we created a new collection in which we annotate every sentence within a document, allowing assessment of recall as well as precision. Two annotators annotated two full papers using an annotation scheme similar to that used in DEFT (Spala et al, 2019) except for omitting cross-sentence links.…”
Section: Full Document Definition Annotationmentioning
confidence: 99%
See 1 more Smart Citation
“…•DEFT: This is a recently released dataset for DE (Spala et al 2019). DEFT consists of two categories of definitions: a) Contracts: involving 2,433 sentences from the 2017 SEC contract filing with 537 definitional and 1906 nondefinitional sentences.…”
Section: Experiments Dataset and Hyper Parametersmentioning
confidence: 99%