Learning with Noise: Enhance Distantly Supervised Relation
            Extraction with Dynamic Transition Matrix

Luo, Bingfeng; Feng, Yansong; Wang, Zheng; Zhu, Zhanxing; Huang, Songfang; Yan, Rui; Zhao, Dongyan

doi:10.18653/v1/p17-1040

Cited by 84 publications

(72 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Early distantly supervised approaches (Mintz et al, 2009) use multi-instance learning (Riedel et al, 2010) and multi-instance multi-label learning (Surdeanu et al, 2012;Hoffmann et al, 2011) to model the assumption that at least one sentence per relation instance correctly expresses the relation. With the increasing popularity of neural networks, PCNN (Zeng et al, 2014) became the most widely used architecture, with extensions for multi-instance learning (Zeng et al, 2015), selective attention (Lin et al, 2016;Han et al, 2018), adversarial training (Wu et al, 2017;Qin et al, 2018), noise models (Luo et al, 2017), and soft labeling (Liu et al, 2017;. Recent work showed graph convolutions (Vashishth et al, 2018) and capsule networks (Zhang et al, 2018a), previously applied to the supervised setting , to be also applicable in a distantly supervised setting.…”

Section: Distantly Supervised Relation Extractionmentioning

confidence: 99%

Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction

Alt¹,

Hübner²,

Hennig³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

109

View full text Add to dashboard Cite

Distantly supervised relation extraction is widely used to extract relational facts from text, but suffers from noisy labels. Current relation extraction methods try to alleviate the noise by multi-instance learning and by providing supporting linguistic and contextual information to more efficiently guide the relation classification. While achieving state-of-the-art results, we observed these models to be biased towards recognizing a limited set of relations with high precision, while ignoring those in the long tail. To address this gap, we utilize a pre-trained language model, the OpenAI Generative Pre-trained Transformer (GPT) (Radford et al., 2018). The GPT and similar models have been shown to capture semantic and syntactic features, and also a notable amount of "common-sense" knowledge, which we hypothesize are important features for recognizing a more diverse set of relations. By extending the GPT to the distantly supervised setting, and fine-tuning it on the NYT10 dataset, we show that it predicts a larger set of distinct relation types with high confidence. Manual and automated evaluation of our model shows that it achieves a state-of-the-art AUC score of 0.422 on the NYT10 dataset, and performs especially well at higher recall levels.

show abstract

Section: Distantly Supervised Relation Extractionmentioning

confidence: 99%

Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction

Alt¹,

Hübner²,

Hennig³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

109

View full text Add to dashboard Cite

show abstract

“…(Lin et al 2016) proposes the selective attention to select high quality sentence features in the bag as the bag feature and train the model by the bag feature. (Luo et al 2017) proposes a transition matrix based method to dynamically characterize the noise. (Feng et al 2018) uses reinforcement learning to select a more reliable subset on the DS dataset and uses it to train the classifier.…”

Section: Related Workmentioning

confidence: 99%

Cross-Relation Cross-Bag Attention for Distantly-Supervised Relation Extraction

Yuan

Liu

Tang

et al. 2019

AAAI

View full text Add to dashboard Cite

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C 2 SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.

show abstract

“…An approach that also takes the corresponding features x into account can model more complex relations. Veit et al (2017) and Luo et al (2017) use multiple layers of a neural network to model these relationships. However, in low resource settings with only small amounts of clean, supervised data, these more complex models can be difficult to learn.…”

Section: Feature Dependent Noise Modelmentioning

confidence: 99%

“…The Base is trained only on clean data while Base+Noise is trained on both the clean and the noisy data without noise handling. Global-CM uses a global (Luo et al 2017) 32.6 ± 0.9 53.7 ± 1.8 57.6 ± 0.8 52.3 ± 0.8 36.7 ± 2.9 Global-ID-CM (H. and K. 2018) 27.1 ± 0.7 51.0 ± 1.1 50.9 ± 0.7 51.4 ± 0.6 29.9 ± 2.6 Global-CM (H. and K. 2018) 34.1 ± 1.4 52.0 ± 1.6 52.8 ± 0.6 52.3 ± 0.6 33.3 ± 2.0…”

Section: Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels

Lange

Hedderich

Klakow

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

In low-resource settings, the performance of supervised labeling models can be improved with automatically annotated or distantly supervised data, which is cheap to create but often noisy. Previous works have shown that significant improvements can be reached by injecting information about the confusion between clean and noisy labels in this additional training data into the classifier training. However, for noise estimation, these approaches either do not take the input features (in our case word embeddings) into account, or they need to learn the noise modeling from scratch which can be difficult in a low-resource setting. We propose to cluster the training data using the input features and then compute different confusion matrices for each cluster. To the best of our knowledge, our approach is the first to leverage feature-dependent noise modeling with pre-initialized confusion matrices. We evaluate on low-resource named entity recognition settings in several languages, showing that our methods improve upon other confusion-matrix based methods by up to 9%.

show abstract

Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix

Cited by 84 publications

References 14 publications

Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction

Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction

Cross-Relation Cross-Bag Attention for Distantly-Supervised Relation Extraction

Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels

Contact Info

Product

Resources

About