GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Zhao, Xinyan; Ding, Haibo; Feng, Zhe

doi:10.18653/v1/2021.eacl-main.318

Cited by 9 publications

(11 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Zhao et al [161] propose a weakly supervised method where they manually prepare some seeding rules and automatically extract all possible rules from unlabeled text for each of the six rule types, and connect them in a graph using cosine similarity. Note that the rule is represented by the average contextual embedding of its matched candidate entities.…”

Section: Rule-based Methodsmentioning

confidence: 99%

“…However, this method is restricted by the entity labeling granularity where we can find some nested entities. The method of Zhao et al [161] avoids the ambiguity as it automatically propagates some seed rules based on lexical or contextual clues which are strong indicators of entity recognition. In addition, the authors have fine-tuned a pre-trained contextual embedding model BERT in the biomedical domain.…”

Section: Nimentioning

confidence: 99%

See 1 more Smart Citation

Information extraction from electronic medical documents: state of the art and future research directions

2022

View full text Add to dashboard Cite

In the medical field, a doctor must have a comprehensive knowledge by reading and writing narrative documents, and he is responsible for every decision he takes for patients. Unfortunately, it is very tiring to read all necessary information about drugs, diseases and patients due to the large amount of documents that are increasing every day. Consequently, so many medical errors can happen and even kill people. Likewise, there is such an important field that can handle this problem, which is the information extraction. There are several important tasks in this field to extract the important and desired information from unstructured text written in natural language. The main principal tasks are named entity recognition and relation extraction since they can structure the text by extracting the relevant information. However, in order to treat the narrative text we should use natural language processing techniques to extract useful information and features. In our paper, we introduce and discuss the several techniques and solutions used in these tasks. Furthermore, we outline the challenges in information extraction from medical documents. In our knowledge, this is the most comprehensive survey in the literature with an experimental analysis and a suggestion for some uncovered directions.

show abstract

Section: Rule-based Methodsmentioning

confidence: 99%

Section: Nimentioning

confidence: 99%

Information extraction from electronic medical documents: state of the art and future research directions

2022

View full text Add to dashboard Cite

show abstract

“…The methods generally require an initial set of labeled data, or seed LFs developed by users. Snuba [33] learns weak classifiers as heuristic models from a small labeled dataset; TALLOR [16] and GLaRA [39] use an initial set of seed LFs to generate new ones by compounding multiple simpler LFs and by exploiting the semantic relationship of the seed LFs respectively; [31] applies program systhesis to generate task-level LFs from a set of labeled data and domain-level LFs.…”

Section: Related Workmentioning

confidence: 99%

Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming

Hsieh¹,

Zhang²,

Ratner³

2022

Preprint

View full text Add to dashboard Cite

Weak Supervision (WS) techniques allow users to efficiently create large training datasets by programmatically labeling data with heuristic sources of supervision. While the success of WS relies heavily on the provided labeling heuristics, the process of how these heuristics are created in practice has remained under-explored. In this work, we formalize the development process of labeling heuristics as an interactive procedure, built around the existing workflow where users draw ideas from a selected set of development data for designing the heuristic sources. With the formalism, we study two core problems of how to strategically select the development data to guide users in efficiently creating informative heuristics, and how to exploit the information within the development process to contextualize and better learn from the resultant heuristics (Figure 1). Building upon two novel methodologies that effectively tackle the respective problems considered, we present Nemo, an end-to-end interactive system that improves the overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS approach.

show abstract

“…[7] and [25] interactively generate labeling functions based on user feedback. TALLOR [46] and GLaRA [106] automatically augment an initial set of labeling functions with new ones. Different from existing works that optimize the task performance, the procedural labeling function generators in WRENCH facilitate the study of the impact of different weak supervision sources.…”

Section: Related Workmentioning

confidence: 99%

“…(2) Active generation and repurposing of supervision sources. To further reduce human annotation efforts, very recently, researchers turn to active generation [91,46,106,7,25] and repurposing [27] of supervision sources. In the future, we plan to incorporate these new tasks and methods into WRENCH to extend its scope.…”

Section: A3 Hosting and Maintenance Planmentioning

confidence: 99%

WRENCH: A Comprehensive Benchmark for Weak Supervision

Zhang,

Yu,

et al. 2021

Preprint

View full text Add to dashboard Cite

Recent Weak Supervision (WS) approaches have had widespread success in easing the bottleneck of labeling training data for machine learning by synthesizing labels from multiple potentially noisy supervision sources. However, proper measurement and analysis of these approaches remain a challenge. First, datasets used in existing works are often private and/or custom, limiting standardization. Second, WS datasets with the same name and base data often vary in terms of the labels and weak supervision sources used, a significant "hidden" source of evaluation variance. Finally, WS studies often diverge in terms of the evaluation protocol and ablations used. To address these problems, we introduce a benchmark platform, WRENCH, for thorough and standardized evaluation of WS approaches. It consists of 22 varied real-world datasets for classification and sequence tagging; a range of real, synthetic, and procedurally-generated weak supervision sources; and a modular, extensible framework for WS evaluation, including implementations for popular WS methods. We use WRENCH to conduct extensive comparisons over more than 100 method variants to demonstrate its efficacy as a benchmark platform. The code is available at https://github.com/JieyuZ2/wrench.Preprint. Under review.

show abstract

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Cited by 9 publications

References 41 publications

Information extraction from electronic medical documents: state of the art and future research directions

Information extraction from electronic medical documents: state of the art and future research directions

Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming

WRENCH: A Comprehensive Benchmark for Weak Supervision

Contact Info

Product

Resources

About