Semi-supervised Relation Extraction via Incremental Meta Self-Training

Hu, Xuming; Ma, Fukun; Liu, Chenyao; Zhang, Chenwei; Wen, Lijie; Yu, Philip S.

doi:10.18653/v1/2021.findings-emnlp.44

Cited by 40 publications

(37 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, these works heavily rely on a frequently re-initialized linear classification layer which interferes with representation learning. Zhan et al (2020) proposes Online Deep Clustering that performs clustering and network update simultaneously rather than alternatingly to tackle this concern, however, the noisy pseudo labels still affect feature clustering when updating the network (Hu et al, 2021a;Li et al, 2022b;Lin et al, 2022).…”

Section: Related Workmentioning

confidence: 99%

HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

Li¹,

Zhang²,

Hu³

et al. 2022

Findings of the Association for Computational Linguistics: ACL 2022

View full text Add to dashboard Cite

Unsupervised relation extraction aims to extract the relationship between entities from natural language sentences without prior information on relational scope or distribution. Existing works either utilize self-supervised schemes to refine relational feature signals by iteratively leveraging adaptive clustering and classification that provoke gradual drift problems, or adopt instance-wise contrastive learning which unreasonably pushes apart those sentence pairs that are semantically similar. To overcome these defects, we propose a novel contrastive learning framework named HiURE, which has the capability to derive hierarchical signals from relational feature space using cross hierarchy attention and effectively optimize relation representation of sentences under exemplar-wise contrastive learning. Experimental results on two public datasets demonstrate the advanced effectiveness and robustness of HiURE on unsupervised relation extraction when compared with state-of-the-art models. Source code is available here 1 .

show abstract

Section: Related Workmentioning

confidence: 99%

HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

Li¹,

Zhang²,

Hu³

et al. 2022

Findings of the Association for Computational Linguistics: ACL 2022

View full text Add to dashboard Cite

show abstract

“…Conventional RE methods include supervised (Zelenko et al, 2002;Liu et al, 2013;Zeng et al, 2014;Miwa and Bansal, 2016), semi-supervised (Chen et al, 2006;Sun et al, 2011;Hu et al, 2020) and distantly supervised methods (Mintz et al, 2009;Yao et al, 2011;Zeng et al, 2015;Han et al, 2018a). These methods rely on a predefined relation set and have limitations in real scenario where novel relations are emerging.…”

Section: Related Workmentioning

confidence: 99%

Continual Few-shot Relation Learning via Embedding Space Regularization and Data Augmentation

Qin¹,

Joty²

2022

Preprint

View full text Add to dashboard Cite

Existing continual relation learning (CRL) methods rely on plenty of labeled training data for learning a new task, which can be hard to acquire in real scenario as getting large and representative labeled data is often expensive and time-consuming. It is therefore necessary for the model to learn novel relational patterns with very few labeled data while avoiding catastrophic forgetting of previous task knowledge. In this paper, we formulate this challenging yet practical problem as continual few-shot relation learning (CFRL). Based on the finding that learning for new emerging few-shot tasks often results in feature distributions that are incompatible with previous tasks' learned distributions, we propose a novel method based on embedding space regularization and data augmentation. Our method generalizes to new few-shot tasks and avoids catastrophic forgetting of previous tasks by enforcing extra constraints on the relational embeddings and by adding extra relevant data in a self-supervised manner. With extensive experiments we demonstrate that our method can significantly outperform previous state-of-the-art methods in CFRL task settings. 1

show abstract

“…Self-training has been studied for many years (Yarowsky, 1995;Riloff and Wiebe, 2003;Rosenberg et al, 2005) and widely adopted in many NLP tasks including speech recognition (Kahn et al, 2020;Park et al, 2020), parsing (McClosky et al, 2006McClosky and Charniak, 2008), and pre-training (Du et al, 2021). Self-Training suffers from inaccurate pseudo labels (Arazo et al, 2020(Arazo et al, , 2019Hu et al, 2021a) especially when the teacher model is trained on insufficient and unbalanced datasets. To address this problem, (Pham et al, 2020;Wang et al, 2021b;Hu et al, 2021a) propose to utilize the performance of the student model on the held out labeled data as a Meta Learning objective to update the teacher model or improve the pseudo-label generation process.…”

Section: Related Workmentioning

confidence: 99%

“…Self-Training suffers from inaccurate pseudo labels (Arazo et al, 2020(Arazo et al, , 2019Hu et al, 2021a) especially when the teacher model is trained on insufficient and unbalanced datasets. To address this problem, (Pham et al, 2020;Wang et al, 2021b;Hu et al, 2021a) propose to utilize the performance of the student model on the held out labeled data as a Meta Learning objective to update the teacher model or improve the pseudo-label generation process. (Hu et al, 2021b) leverage the cosine distance between gradients computed on labeled data and pseudolabeled data as feedback to guide the self-training process.…”

Section: Related Workmentioning

confidence: 99%

Improve Event Extraction via Self-Training with Gradient Guidance

Xu¹,

Huang²

2022

Preprint

View full text Add to dashboard Cite

Data scarcity and imbalance have been the main factors that hinder the progress of event extraction (EE). In this work, we propose a self-training with gradient guidance (STGG) framework which consists of (1) a base event extraction model which is firstly trained on existing event annotations and then applied to large-scale unlabeled corpora to predict new event mentions, and (2) a scoring model that takes in each predicted event trigger and argument as well as their path in the Abstract Meaning Representation (AMR) graph to estimate a probability score indicating the correctness of the event prediction. The new event predictions along with their correctness scores are then used as pseudo labeled examples to improve the base event extraction model while the magnitude and direction of its gradients are guided by the correctness scores. Experimental results on three benchmark datasets, including ACE05-E, ACE05-E + , and ERE, demonstrate the effectiveness of the STGG framework on event extraction task with up to 1.9 Fscore improvement over the base event extraction models. Our experimental analysis further shows that STGG is a general framework as it can be applied to any base event extraction models and improve their performance by leveraging broad unlabeled data, even when the high-quality AMR graph annotations are not available.

show abstract

Semi-supervised Relation Extraction via Incremental Meta Self-Training

Cited by 40 publications

References 22 publications

HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

Continual Few-shot Relation Learning via Embedding Space Regularization and Data Augmentation

Improve Event Extraction via Self-Training with Gradient Guidance

Contact Info

Product

Resources

About