Noisy Or-based model for Relation Extraction using Distant Supervision

Nagesh, Ajay; Haffari, Gholamreza; Ramakrishnan, Ganesh

doi:10.3115/v1/d14-1208

Cited by 7 publications

(6 citation statements)

References 4 publications

(7 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While supervised entity recognition systems [14,34] focus on a few common entity types, weakly-supervised methods [18,36] and distantly-supervised methods [41,54,26] use large text corpus and a small set of seeds (or a knowledge base) to induce patterns or to train models, and thus can apply to different domains without additional human annotation labor. For relation extraction, similarly, weak supervision [6,13] and distant supervision [35,53,49,21,43,31] approaches are proposed to address the domain restriction issue in traditional supervised systems [2,33,17]. However, such a "pipeline" diagram ignores the dependencies between different sub tasks and may suffer from error propagation between the tasks.…”

Section: Related Workmentioning

confidence: 99%

CoType

Ren

et al. 2017

Proceedings of the 26th International Conference on World Wide Web

226

View full text Add to dashboard Cite

Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. In this paper, we investigate joint extraction of typed entities and relations with labeled data heuristically obtained from knowledge bases (i.e., distant supervision). As our algorithm for type labeling via distant supervision is context-agnostic, noisy training data poses unique challenges for the task. We propose a novel domainindependent framework, called COTYPE, that runs a data-driven text segmentation algorithm to extract entity mentions, and jointly embeds entity mentions, relation mentions, text features and type labels into two low-dimensional spaces (for entity and relation mentions respectively), where, in each space, objects whose types are close will also have similar representations. COTYPE, then using these learned embeddings, estimates the types of test (unlinkable) mentions. We formulate a joint optimization problem to learn embeddings from text corpora and knowledge bases, adopting a novel partial-label loss function for noisy labeled data and introducing an object "translation" function to capture the cross-constraints of entities and relations on each other. Experiments on three public datasets demonstrate the effectiveness of COTYPE across different domains (e.g., news, biomedical), with an average of 25% improvement in F1 score compared to the next best method.

show abstract

Section: Related Workmentioning

confidence: 99%

CoType

Ren

et al. 2017

Proceedings of the 26th International Conference on World Wide Web

226

View full text Add to dashboard Cite

show abstract

“…Besides, Fan et al [24] presented a novel framework by integrating active learning and weakly supervised learning. Nagesh et al [25] solved the label assigning problem with integer linear programming (ILP) and improved the baselines. In addition, there are some deep learning based methods using convolutional neural networks to do feature modeling and MIL to do distant supervision [26].…”

Section: Distant Supervision For Relation Extractionmentioning

confidence: 99%

Distant Supervision for Relation Extraction with Ranking-Based Methods

Xiang

Chen

Wang

et al. 2016

Entropy

View full text Add to dashboard Cite

Abstract:Relation extraction has benefited from distant supervision in recent years with the development of natural language processing techniques and data explosion. However, distant supervision is still greatly limited by the quality of training data, due to its natural motivation for greatly reducing the heavy cost of data annotation. In this paper, we construct an architecture called MIML-sort (Multi-instance Multi-label Learning with Sorting Strategies), which is built on the famous MIML framework. Based on MIML-sort, we propose three ranking-based methods for sample selection with which we identify relation extractors from a subset of the training data. Experiments are set up on the KBP (Knowledge Base Propagation) corpus, one of the benchmark datasets for distant supervision, which is large and noisy. Compared with previous work, the proposed methods produce considerably better results. Furthermore, the three methods together achieve the best F 1 on the official testing set, with an optimal enhancement of F 1 from 27.3% to 29.98%.

show abstract

“…Most aforementioned work used SIL, MIL, or MIML to train classifiers, which set strong baselines in this field. In addition, recent researches also include embedding based models that transferred the relation extraction problem into a translation model like ℎ + ≈ [22][23][24], nonnegative matrix factorization (NMF) models [8,9] with the characteristics of training and testing jointly, integrating active learning and weakly supervised learning [25], integer linear programming (ILP) [26], and so on.…”

Section: Distant Supervision For Relation Extractionmentioning

confidence: 99%

“…We test on the KBP dataset, one of the benchmark datasets in this literature constructed by Surdeanu et al [4]. The resources are mainly from the TAC KBP 2010 and 2011 slot filling shared tasks [25,26] which contain 183,062 and 3,334 entity pairs for training and testing. The free texts come from the collection provided by the shared task, which contains approximately 1.5 million documents from a variety of sources, including newswire, blogs, and telephone conversation transcripts.…”

Section: Dataset Descriptionmentioning

confidence: 99%

Bias Modeling for Distantly Supervised Relation Extraction

Xiang

Zhang

Wang

et al. 2015

Mathematical Problems in Engineering

View full text Add to dashboard Cite

Distant supervision (DS) automatically annotates free text with relation mentions from existing knowledge bases (KBs), providing a way to alleviate the problem of insufficient training data for relation extraction in natural language processing (NLP). However, the heuristic annotation process does not guarantee the correctness of the generated labels, promoting a hot research issue on how to efficiently make use of the noisy training data. In this paper, we model two types of biases to reduce noise: (1)bias-distto model the relative distance between points (instances) and classes (relation centers); (2)bias-rewardto model the possibility of each heuristically generated label being incorrect. Based on the biases, we propose three noise tolerant models:MIML-dist,MIML-dist-classify, andMIML-reward, building on top of a state-of-the-art distantly supervised learning algorithm. Experimental evaluations compared with three landmark methods on the KBP dataset validate the effectiveness of the proposed methods.

show abstract

Noisy Or-based model for Relation Extraction using Distant Supervision

Cited by 7 publications

References 4 publications

CoType

CoType

Distant Supervision for Relation Extraction with Ranking-Based Methods

Bias Modeling for Distantly Supervised Relation Extraction

Contact Info

Product

Resources

About