ChemNER: Fine-Grained Chemistry Named Entity Recognition with Ontology-Guided Distant Supervision

Wang, Xuan; Hu, Vivian; Song, Xiangchen; Garg, Shweta; Xiao, Jinfeng

doi:10.18653/v1/2021.emnlp-main.424

Cited by 11 publications

(4 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several strategies were developed and investigated to exploit external lexical and semantic resources to improve machine learning models. These strategies include thematic masking [1] , named entity recognition by distant supervision [2] , and ontology-based normalization [3] . The biological roles of MOs depend mainly on their structure.…”

Section: Value Of the Datamentioning

confidence: 99%

MilkOligoThesaurus, a dataset of mammalian milk oligosaccharide synonyms

Rumeau,

Fenaille,

Girard

et al. 2024

Data in Brief

View full text Add to dashboard Cite

Section: Value Of the Datamentioning

confidence: 99%

MilkOligoThesaurus, a dataset of mammalian milk oligosaccharide synonyms

Rumeau,

Fenaille,

Girard

et al. 2024

Data in Brief

View full text Add to dashboard Cite

“…Distant supervision (Mintz et al, 2009) uses structured knowledge to annotate raw text with pseudo labels. Performing distantly supervised fine-tuning with in-domain structured knowledge after the MLM pre-training is effective in domainspecific NER (Wang et al, 2021;Trieu et al, 2022). However, domain-specific distant supervised learning depends on the structured knowledge's coverage of the label set of the downstream task.…”

Section: Ner With Unstructured Knowledgementioning

confidence: 99%

Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge

Nishida,

Yoshinaga,

Nishida

2023

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

View full text Add to dashboard Cite

Although named entity recognition (NER) helps us to extract domain-specific entities from text (e.g., artists in the music domain), it is costly to create a large amount of training data or a structured knowledge base to perform accurate NER in the target domain. Here, we propose selfadaptive NER, which retrieves external knowledge from unstructured text to learn the usages of entities that have not been learned well. To retrieve useful knowledge for NER, we design an effective two-stage model that retrieves unstructured knowledge using uncertain entities as queries. Our model predicts the entities in the input and then finds those of which the prediction is not confident. Then, it retrieves knowledge by using these uncertain entities as queries and concatenates the retrieved text to the original input to revise the prediction. Experiments on CrossNER datasets demonstrated that our model outperforms strong baselines by 2.35 points in F 1 metric.

show abstract

“…(3) Experiments: Extensive experiments on two public datasets (Tabassum et al 2020;Bridges et al 2013) covering four domains (i.e., StackOverflow, GitHub, National Vulnerability Database, and Metasploit) demonstrate the effectiveness of SETYPE given 10 to 15 fine-grained types related to code, software, and security. Although we focus on software and security domain examples, our entity typing framework can be applied to other specialized domains including science (Wang et al 2021) and engineering (O'Gorman et al 2021).…”

Section: Introductionmentioning

confidence: 99%

Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains

Zhang,

Shen

et al. 2024

AAAI

View full text Add to dashboard Cite

Accurately typing entity mentions from text segments is a fundamental task for various natural language processing applications. Many previous approaches rely on massive human-annotated data to perform entity typing. Nevertheless, collecting such data in highly specialized science and engineering domains (e.g., software engineering and security) can be time-consuming and costly, without mentioning the domain gaps between training and inference data if the model needs to be applied to confidential datasets. In this paper, we study the task of seed-guided fine-grained entity typing in science and engineering domains, which takes the name and a few seed entities for each entity type as the only supervision and aims to classify new entity mentions into both seen and unseen types (i.e., those without seed entities). To solve this problem, we propose SEType which first enriches the weak supervision by finding more entities for each seen type from an unlabeled corpus using the contextualized representations of pre-trained language models. It then matches the enriched entities to unlabeled text to get pseudo-labeled samples and trains a textual entailment model that can make inferences for both seen and unseen types. Extensive experiments on two datasets covering four domains demonstrate the effectiveness of SEType in comparison with various baselines. Code and data are available at: https://github.com/yuzhimanhua/SEType.

show abstract

ChemNER: Fine-Grained Chemistry Named Entity Recognition with Ontology-Guided Distant Supervision

Cited by 11 publications

References 32 publications

MilkOligoThesaurus, a dataset of mammalian milk oligosaccharide synonyms

MilkOligoThesaurus, a dataset of mammalian milk oligosaccharide synonyms

Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge

Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains

Contact Info

Product

Resources

About