Visual Supervision in Bootstrapped Information Extraction

Berger, Matthew; Nagesh, Ajay; Levine, Joshua A.; Surdeanu, Mihai; Zhang, Hao Helen

doi:10.18653/v1/d18-1229

Cited by 9 publications

(7 citation statements)

References 25 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In (Lison et al, 2020), the weak training data is created by broadly collecting available labeling rules from multiple sources, which demonstrates the importance of being able to automatically find new heuristics missed by human efforts. To find new heuristic rules on the basis of a relatively limited number of manually designed rules, previous studies have tried bootstrapping by relying on the co-occurrence, context and pattern features (Thelen and Riloff, 2002;Riloff et al, 2003;Yangarber, 2003;Shen et al, 2017;Tao et al, 2015;Berger et al, 2018;Yan et al, 2019).…”

Section: Related Workmentioning

confidence: 99%

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Zhao¹,

Ding²,

Feng³

2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

Instead of using expensive manual annotations, researchers have proposed to train named entity recognition (NER) systems using heuristic labeling rules. However, devising labeling rules is challenging because it often requires a considerable amount of manual effort and domain expertise. To alleviate this problem, we propose GLARA, a graph-based labeling rule augmentation framework, to learn new labeling rules from unlabeled data. We first create a graph with nodes representing candidate rules extracted from unlabeled data. Then, we design a new graph neural network to augment labeling rules by exploring the semantic relations between rules. We finally apply the augmented rules on unlabeled data to generate weak labels and train a NER model using the weakly labeled data. We evaluate our method on three NER datasets and find that we can achieve an average improvement of +20% F1 score over the best baseline when given a small set of seed rules. ⋮ Labeling Rule Applier *noma -> Disease *athy -> Disease *tion -> Other *lity -> Other Seeding Rules *noma-> Disease *athy -> Disease *homa -> Disease *kemias-> Disease *ndrome-> Disease Rank and Select New Rules ... *noma *kemias *ation *tion *lity *ility *athy *ndrome *homa *itoyl *ation *ency *trophy *onia *noma *tonia *kemias *tity *itity *tion *ndrome *homa *ndrome Candidate Rules

show abstract

Section: Related Workmentioning

confidence: 99%

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Zhao¹,

Ding²,

Feng³

2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

show abstract

“…However, those heuristic constraints are usually not flexible due to their requirement for expert efforts. In contrast, recent studies focus on learning the distance metrics to determine boundaries using weak supervision Berger et al, 2018;Zupon et al, 2019;Yan et al, 2020a). For example, Yan et al (2020a) propose an end-toend bootstrapping network learned by multi-view learning, and extend it by self-supervised and supervised pre-training (Yan et al, 2020b).…”

Section: Related Workmentioning

confidence: 99%

“…Unfortunately, these heuristic metrics heavily depend on the selected seeds, making the boundary biased and unreliable (Curran et al, 2007;McIntosh and Curran, 2009). Although some studies extend them with extra constraints (Carlson et al, 2010) or manual participants (Berger et al, 2018), the requirement of expert knowledge makes them ad-hoc and inflexible. Some studies try to learn the distance metrics (Zupon et al, 2019;Yan et al, 2020a), but they still suffer from weak supervision.…”

Section: Introductionmentioning

confidence: 99%

Progressive Adversarial Learning for Bootstrapping: A Case Study on Entity Set Expansion

Yan

Han

Sun

2021

Preprint

View full text Add to dashboard Cite

Bootstrapping has become the mainstream method for entity set expansion. Conventional bootstrapping methods mostly define the expansion boundary using seed-based distance metrics, which heavily depend on the quality of selected seeds and are hard to be adjusted due to the extremely sparse supervision. In this paper, we propose Bootstrap-GAN, a new learning method for bootstrapping which jointly models the bootstrapping process and the boundary learning process in a GAN framework. Specifically, the expansion boundaries of different bootstrapping iterations are learned via different discriminator networks; the bootstrapping network is the generator to generate new positive entities, and the discriminator networks identify the expansion boundaries by trying to distinguish the generated entities from known positive entities. By iteratively performing the above adversarial learning, the generator and the discriminators can reinforce each other and be progressively refined along the whole bootstrapping process. Experiments show that Bootstrap-GAN achieves the new state-of-the-art entity set expansion performance.

show abstract

“…The pipelined methods (Riloff and Jones, 1999;Collins and Singer, 1999) mainly leverage direct co-occurrence information, which will easily lead to the semantic drifting problem (Curran et al, 2007). To resolve this problem, many pipelined methods are proposed, e.g., mutual exclusive bootstrapping (Curran et al, 2007;Curran, 2008, 2009;Gupta et al, 2018), bootstrapping using negative seeds (Yangarber et al, 2002;Shi et al, 2014), lexical and statistical features (Liao and Grishman, 2010;Gupta and Manning, 2014), word embeddings (Batista et al, 2015;Gupta and Manning, 2015;Zupon et al, 2019), active learning (Berger et al, 2018), lookahead search (Yan et al, 2019), etc. Recently Yan et al (2020 propose an end-to-end bootstrapping model and show its advantages in information leveraging and flexibility.…”

Section: Related Workmentioning

confidence: 99%

Global Bootstrapping Neural Network for Entity Set Expansion

Yan

Han

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Bootstrapping for entity set expansion (ESE) has been studied for a long period, which expands new entities using only a few seed entities as supervision. Recent end-to-end bootstrapping approaches have shown their advantages in information capturing and bootstrapping process modeling. However, due to the sparse supervision problem, previous endto-end methods often only leverage information from near neighborhoods (local semantics) rather than those propagated from the co-occurrence structure of the whole corpus (global semantics). To address this issue, this paper proposes Global Bootstrapping Network (GBN) with the "pre-training and fine-tuning" strategies for effective learning. Specifically, it contains a global-sighted encoder to capture and encode both local and global semantics into entity embedding, and an attention-guided decoder to sequentially expand new entities based on these embeddings. The experimental results show that the GBN learned by "pretraining and fine-tuning" strategies achieves state-of-the-art performance on two bootstrapping datasets.

show abstract

Visual Supervision in Bootstrapped Information Extraction

Cited by 9 publications

References 25 publications

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Progressive Adversarial Learning for Bootstrapping: A Case Study on Entity Set Expansion

Global Bootstrapping Neural Network for Entity Set Expansion

Contact Info

Product

Resources

About