Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Luan, Yi; He, Luheng; Ostendorf, Mari; Hajishirzi, Hannaneh

doi:10.18653/v1/d18-1360

Cited by 478 publications

(484 citation statements)

References 32 publications

Supporting

Mentioning

417

Contrasting

Unclassified

Order By: Relevance

“…5 annotated abstracts per domain serving as training data are sufficient to build a performant model. Our active learning results for SciERC [28] and ScienceIE17 [2] datasets were similar. The promising results suggest that we do not need a large annotated dataset for scientific information extraction.…”

Section: Discussionsupporting

confidence: 55%

“…It can be observed that Agr, Med, Bio, and Ast classifiers are the best in extracting PROCESS, METHOD, MATERIAL, and DATA, respectively. For SciERC [28] and ScienceIE17 [2] similar results are demonstrating that MNLP can significantly reduce the amount of labelled data.…”

Section: Traditionally Trained Classifiersmentioning

confidence: 64%

“…ScienceIE17 [2] from Computer Science, Material Sciences, and Physics contains three concepts PROCESS, TASK and MATERIAL. SciERC [28] from the machine learning domain contains six concepts TASK, METHOD, METRIC, MATERIAL, OTHER-SCIEN-TIFICTERM and GENERIC. Each corpus covers at most three domains.…”

Section: Scientific Corporamentioning

confidence: 99%

See 2 more Smart Citations

Domain-Independent Extraction of Scientific Concepts from Research Articles

Brack

D’Souza

Hoppe

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present two deep learning systems as baselines. In particular, we propose active learning to deal with different domains in our task. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.

show abstract

Section: Discussionsupporting

confidence: 55%

Section: Traditionally Trained Classifiersmentioning

confidence: 64%

See 1 more Smart Citation

Domain-Independent Extraction of Scientific Concepts from Research Articles

Brack

D’Souza

Hoppe

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…We refer to this split as ACE05-E in what follows. The SciERC corpus (Luan et al, 2018) provides entity, coreference and relation annotations from 500 AI paper abstracts. The GENIA corpus (Kim et al, 2003) provides entity tags and coreferences for 1999 abstracts from the biomedical research literature with a substantial portion of entities (24%) overlapping some other entity.…”

Section: Methodsmentioning

confidence: 99%

Entity, Relation, and Event Extraction with Contextualized Span Representations

Wadden¹,

Wennberg²,

Luan³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

395

298

View full text Add to dashboard Cite

We examine the capabilities of a unified, multitask framework for three information extraction tasks: named entity recognition, relation extraction, and event extraction. Our framework (called DYGIE++) accomplishes all tasks by enumerating, refining, and scoring text spans designed to capture local (withinsentence) and global (cross-sentence) context. Our framework achieves state-of-theart results across all tasks, on four datasets from a variety of domains. We perform experiments comparing different techniques to construct span representations. Contextualized embeddings like BERT perform well at capturing relationships among entities in the same or adjacent sentences, while dynamic span graph updates model long-range crosssentence relationships. For instance, propagating span representations via predicted coreference links can enable the model to disambiguate challenging entity mentions. Our code is publicly available at https://github. com/dwadden/dygiepp and can be easily adapted for new tasks or datasets.

show abstract

“…KGs have been used to improve NLP performance in a wide variety of genres, including summarization or information extraction from EHRs and answering medical questions (17,28,29,33,42,62,63). KG-derived embeddings used alone, or in combination with text-derived features (48) improved performance of a variety of NLP tasks, including named-entity recognition (64), coreference resolution (65) and relation extraction (66).…”

Section: Natural Language Processing Applicationsmentioning

confidence: 99%

Knowledge-Based Biomedical Data Science

Callahan

Tripodi

Pielke-Lombardo

et al. 2020

Annu. Rev. Biomed. Data Sci.

View full text Add to dashboard Cite

Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.TBoxes (for terminology), and assertions composed of them are ABoxes (for assertion) (12). Knowledge-base vs. Knowledge GraphKnowledge-bases that can be represented as graphs are often called knowledge graphs.While not all knowledge-bases are implemented as graphs (e.g. some are databases where table structure makes implicit assertions), in recent years, it has become very common to represent knowledge-bases using the Semantic Web standard or, at least be able to produce and consume Semantic Web compatible versions. For that reason, the terms knowledge-base and knowledge graph are often used interchangeably. In 2012, Google announced its proprietary Knowledge Graph, which also popularized the use of the term (13). The literature sometimes contains terminological imprecision about what the differences are between knowledge-bases, knowledge graphs and ontologies; there is a review and analysis of various published definitions (14). In this review, we use the term knowledge graph (or KG) and say a KG is grounded in the set of primitives from which it is constructed. Some KGs also include a set of logical rules that relate assertions to each other (e.g. Human TP53 is the subclass of TP53 proteins that is found in the organism human) called axioms. Biomedical ApplicationsKBDS does computation over KGs (and perhaps other inputs) to make inferences about biomedicine. While each of the publications surveyed below addresses different problems using different techniques, there are some common themes in the computational approaches to using KGs.

show abstract

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Cited by 478 publications

References 32 publications

Domain-Independent Extraction of Scientific Concepts from Research Articles

Domain-Independent Extraction of Scientific Concepts from Research Articles

Entity, Relation, and Event Extraction with Contextualized Span Representations

Knowledge-Based Biomedical Data Science

Contact Info

Product

Resources

About