Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1040
|View full text |Cite
|
Sign up to set email alerts
|

Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix

Abstract: Distant supervision significantly reduces human efforts in building training data for many classification tasks. While promising, this technique often introduces noise to the generated training data, which can severely affect the model performance. In this paper, we take a deep look at the application of distant supervision in relation extraction. We show that the dynamic transition matrix can effectively characterize the noise in the training data built by distant supervision. The transition matrix can be eff… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
72
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 84 publications
(72 citation statements)
references
References 14 publications
0
72
0
Order By: Relevance
“…Early distantly supervised approaches (Mintz et al, 2009) use multi-instance learning (Riedel et al, 2010) and multi-instance multi-label learning (Surdeanu et al, 2012;Hoffmann et al, 2011) to model the assumption that at least one sentence per relation instance correctly expresses the relation. With the increasing popularity of neural networks, PCNN (Zeng et al, 2014) became the most widely used architecture, with extensions for multi-instance learning (Zeng et al, 2015), selective attention (Lin et al, 2016;Han et al, 2018), adversarial training (Wu et al, 2017;Qin et al, 2018), noise models (Luo et al, 2017), and soft labeling (Liu et al, 2017;. Recent work showed graph convolutions (Vashishth et al, 2018) and capsule networks (Zhang et al, 2018a), previously applied to the supervised setting , to be also applicable in a distantly supervised setting.…”
Section: Distantly Supervised Relation Extractionmentioning
confidence: 99%
“…Early distantly supervised approaches (Mintz et al, 2009) use multi-instance learning (Riedel et al, 2010) and multi-instance multi-label learning (Surdeanu et al, 2012;Hoffmann et al, 2011) to model the assumption that at least one sentence per relation instance correctly expresses the relation. With the increasing popularity of neural networks, PCNN (Zeng et al, 2014) became the most widely used architecture, with extensions for multi-instance learning (Zeng et al, 2015), selective attention (Lin et al, 2016;Han et al, 2018), adversarial training (Wu et al, 2017;Qin et al, 2018), noise models (Luo et al, 2017), and soft labeling (Liu et al, 2017;. Recent work showed graph convolutions (Vashishth et al, 2018) and capsule networks (Zhang et al, 2018a), previously applied to the supervised setting , to be also applicable in a distantly supervised setting.…”
Section: Distantly Supervised Relation Extractionmentioning
confidence: 99%
“…(Lin et al 2016) proposes the selective attention to select high quality sentence features in the bag as the bag feature and train the model by the bag feature. (Luo et al 2017) proposes a transition matrix based method to dynamically characterize the noise. (Feng et al 2018) uses reinforcement learning to select a more reliable subset on the DS dataset and uses it to train the classifier.…”
Section: Related Workmentioning
confidence: 99%
“…An approach that also takes the corresponding features x into account can model more complex relations. Veit et al (2017) and Luo et al (2017) use multiple layers of a neural network to model these relationships. However, in low resource settings with only small amounts of clean, supervised data, these more complex models can be difficult to learn.…”
Section: Feature Dependent Noise Modelmentioning
confidence: 99%
“…The Base is trained only on clean data while Base+Noise is trained on both the clean and the noisy data without noise handling. Global-CM uses a global (Luo et al 2017) 32.6 ± 0.9 53.7 ± 1.8 57.6 ± 0.8 52.3 ± 0.8 36.7 ± 2.9 Global-ID-CM (H. and K. 2018) 27.1 ± 0.7 51.0 ± 1.1 50.9 ± 0.7 51.4 ± 0.6 29.9 ± 2.6 Global-CM (H. and K. 2018) 34.1 ± 1.4 52.0 ± 1.6 52.8 ± 0.6 52.3 ± 0.6 33.3 ± 2.0…”
Section: Modelsmentioning
confidence: 99%
See 1 more Smart Citation