DEFT: A corpus for definition extraction in free- and semi-structured text

Spala, Sasha; Miller, Nicholas; Yang, Yiming; Dernoncourt, Franck; Dockhorn, Carl

doi:10.18653/v1/w19-4015

Cited by 34 publications

(43 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We run experiments on 8 different benchmark datasets, semeval10 ), 2 tacred (Zhang et al, 2017), kbp37 (Zhang and Wang, 2015), wiki80 , deft2020 (Spala et al, 2019), i2b2 (zlem et al, 2011), ddi (Herrero-Zazo et al, 2013, chemprot (Krallinger et al, 2017). These tasks are from various domains and are different in the respects of dataset sizes, sentence length, entity mention length, etc, to demonstrate that our method is robust for various RC tasks.…”

Section: Datasetsmentioning

confidence: 99%

AutoRC: Improving BERT Based Relation Classification Models via Architecture Search

Zhu¹

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Although BERT based relation classification (RC) models have achieved significant improvements over the traditional deep learning models, it seems that no consensus can be reached on what is the optimal architecture, since there are many design choices available. In this work, we design a comprehensive search space for BERT based RC models and employ a modified version of efficient neural architecture search (ENAS) method to automatically discover the design choices mentioned above. Experiments on eight benchmark RC tasks show that our method is efficient and effective in finding better architectures than the baseline BERT based RC models. Ablation study demonstrates the necessity of our search space design and the effectiveness of our search method. We also show that our framework can also apply to other entity related tasks like coreference resolution and span based named entity recognition (NER).

show abstract

Section: Datasetsmentioning

confidence: 99%

AutoRC: Improving BERT Based Relation Classification Models via Architecture Search

Zhu¹

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…We divided non-definitional text into two types: plausible (24.8%) and implausible (11.8%), which signals an error. The plausible text refers to explanations or secondary information (similar to DEFT (Spala et al, 2019)'s secondary definition, but without sentence crossings).…”

Section: Term (%) Definition (%)mentioning

confidence: 99%

“…Therefore, we created a new collection in which we annotate every sentence within a document, allowing assessment of recall as well as precision. Two annotators annotated two full papers using an annotation scheme similar to that used in DEFT (Spala et al, 2019) except for omitting cross-sentence links.…”

Section: Full Document Definition Annotationmentioning

confidence: 99%

“…We chose to annotate two award-winning ACL papers: ELMo (Peters et al, 2018) and LISA (Strubell et al, 2018) resulting in 485 total sentences from which we identified 98 definitional and 387 non-definitional sentences. Similar to DEFT (Spala et al, 2019), we measured inter-annotator agreement using Krippendorff's alpha (Krippendorff, 2011) with the MASI distance metric (Passonneau, 2006). We obtained 0.626 for terms and 0.527 for definitions, where the agreement score for terms is lower than those in DEFT annotations (0.80).…”

Section: Full Document Definition Annotationmentioning

confidence: 99%

See 1 more Smart Citation

Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions

Kang¹,

Head²,

Sidhu³

et al. 2020

Proceedings of the First Workshop on Scholarly Document Processing

View full text Add to dashboard Cite

The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. Despite prior work on definition detection, current approaches are far from being accurate enough to use in realworld applications.In this paper, we first perform in-depth error analysis of the current best performing definition detection system and discover major causes of errors. Based on this analysis, we develop a new definition detection system, HEDDEx, that utilizes syntactic features, transformer encoders, and heuristic filters, and evaluate it on a standard sentence-level benchmark. Because current benchmarks evaluate randomly sampled sentences, we propose an alternative evaluation that assesses every sentence within a document. This allows for evaluating recall in addition to precision.HEDDEx outperforms the leading system on both the sentence-level and the document-level tasks, by 12.7 F1 points and 14.4 F1 points, respectively. We note that performance on the high-recall document-level task is much lower than in the standard evaluation approach, due to the necessity of incorporation of document structure as features. We discuss remaining challenges in document-level definition detection, ideas for improvements, and potential issues for the development of reading aid applications.

show abstract

“…•DEFT: This is a recently released dataset for DE (Spala et al 2019). DEFT consists of two categories of definitions: a) Contracts: involving 2,433 sentences from the 2017 SEC contract filing with 537 definitional and 1906 nondefinitional sentences.…”

Section: Experiments Dataset and Hyper Parametersmentioning

confidence: 99%

A Joint Model for Definition Extraction with Syntactic Connection and Semantic Consistency

Veyseh

Dernoncourt

Dou

et al. 2020

AAAI

Self Cite

View full text Add to dashboard Cite

Definition Extraction (DE) is one of the well-known topics in Information Extraction that aims to identify terms and their corresponding definitions in unstructured texts. This task can be formalized either as a sentence classification task (i.e., containing term-definition pairs or not) or a sequential labeling task (i.e., identifying the boundaries of the terms and definitions). The previous works for DE have only focused on one of the two approaches, failing to model the inter-dependencies between the two tasks. In this work, we propose a novel model for DE that simultaneously performs the two tasks in a single framework to benefit from their inter-dependencies. Our model features deep learning architectures to exploit the global structures of the input sentences as well as the semantic consistencies between the terms and the definitions, thereby improving the quality of the representation vectors for DE. Besides the joint inference between sentence classification and sequential labeling, the proposed model is fundamentally different from the prior work for DE in that the prior work has only employed the local structures of the input sentences (i.e., word-to-word relations), and not yet considered the semantic consistencies between terms and definitions. In order to implement these novel ideas, our model presents a multi-task learning framework that employs graph convolutional neural networks and predicts the dependency paths between the terms and the definitions. We also seek to enforce the consistency between the representations of the terms and definitions both globally (i.e., increasing semantic consistency between the representations of the entire sentences and the terms/definitions) and locally (i.e., promoting the similarity between the representations of the terms and the definitions). The extensive experiments on three benchmark datasets demonstrate the effectiveness of our approach.1

show abstract

DEFT: A corpus for definition extraction in free- and semi-structured text

Cited by 34 publications

References 13 publications

AutoRC: Improving BERT Based Relation Classification Models via Architecture Search

AutoRC: Improving BERT Based Relation Classification Models via Architecture Search

Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions

A Joint Model for Definition Extraction with Syntactic Connection and Semantic Consistency

Contact Info

Product

Resources

About