2021
DOI: 10.48550/arxiv.2109.11377
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

WRENCH: A Comprehensive Benchmark for Weak Supervision

Jieyu Zhang,
Yue Yu,
Yinghao Li
et al.

Abstract: Recent Weak Supervision (WS) approaches have had widespread success in easing the bottleneck of labeling training data for machine learning by synthesizing labels from multiple potentially noisy supervision sources. However, proper measurement and analysis of these approaches remain a challenge. First, datasets used in existing works are often private and/or custom, limiting standardization. Second, WS datasets with the same name and base data often vary in terms of the labels and weak supervision sources used… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(15 citation statements)
references
References 56 publications
0
15
0
Order By: Relevance
“…We evaluate our framework on nine benchmark NLP classification datasets that are popular in the few-shot learning and weak supervision literature (Ratner et al, 2017;Awasthi et al, 2020;Zhang et al, 2021a;Cohan et al, 2019). These tasks are as follows: AGNews: using news headlines to predict article topic, CDR: using scientific paper excerpts to predict whether drugs induce diseases, ChemProt: using paper experts to predict the functional relationship between chemicals and proteins, IMDB: movie review sentiment, SciCite: classifying citation intent in Computer Science papers, SemEval: relation classification from web text, SMS: text message spam detection, TREC: conversational question intent classification, Youtube: internet comment spam detection.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We evaluate our framework on nine benchmark NLP classification datasets that are popular in the few-shot learning and weak supervision literature (Ratner et al, 2017;Awasthi et al, 2020;Zhang et al, 2021a;Cohan et al, 2019). These tasks are as follows: AGNews: using news headlines to predict article topic, CDR: using scientific paper excerpts to predict whether drugs induce diseases, ChemProt: using paper experts to predict the functional relationship between chemicals and proteins, IMDB: movie review sentiment, SciCite: classifying citation intent in Computer Science papers, SemEval: relation classification from web text, SMS: text message spam detection, TREC: conversational question intent classification, Youtube: internet comment spam detection.…”
Section: Methodsmentioning
confidence: 99%
“…We ran all experiments on Microsoft Azure cloud compute using NVIDIA V100 GPUs (32G VRAM). All algorithms were implemented using the Pytorch and Wrench frameworks (Paszke et al, 2017;Zhang et al, 2021a). We report binary F1 score for binary classification tasks and macro-weighted F1 for multiclass classification tasks.…”
Section: Methodsmentioning
confidence: 99%
“…Also, while Snorkel methods are data free and use only the weak signals to estimate the labels of the data, our method is data dependent and use features of the data to make the generated labels consistent with the data. Concurrent to our work, a new weak supervision benchmark has been developed (Zhang et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…For image classification tasks, we follow Mazzetto et al (2021b;a) to train a branch of image classifiers as supervision sources of seen classes. For text classification tasks, we made keywordbased labeling functions as supervision sources of seen classes following Zhang et al (2021); each of the labeling functions returns its associated label when a certain keyword exists in the text, otherwise abstains. Notably, all the involved supervision sources are "weak" because they cannot predict the desired unseen classes.…”
Section: Setupmentioning
confidence: 99%
“…We use a pre-trained sentence transformer (Reimers & Gurevych, 2019) to obtain document embeddings for classification. We follow Zhang et al (2021) to generate 5 keyword-based labeling functions for each seen label as ILFs.…”
Section: F Experimental Detailsmentioning
confidence: 99%