2022
DOI: 10.48550/arxiv.2202.05433
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Survey on Programmatic Weak Supervision

Abstract: Labeling training data has become one of the major roadblocks to using machine learning. Among various weak supervision paradigms, programmatic weak supervision (PWS) has achieved remarkable success in easing the manual labeling bottleneck by programmatically synthesizing training labels from multiple potentially noisy supervision sources. This paper presents a comprehensive survey of recent advances in PWS. In particular, we give a brief introduction of the PWS learning paradigm, and review representative app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 16 publications
(24 citation statements)
references
References 2 publications
0
19
0
Order By: Relevance
“…Recent progress in DP, or more broadly weak supervision, has largely been made in developing advanced label models [8,11,20,24,25] that denoise and aggregate the weak supervision sources for various applications [4,10,14,27,28,30,34]. We defer readers to [35] for a more comprehensive survey on weak supervision methods, and focus this section on related work that anchors more on the development process of weak supervision sources and the different interactive learning schemes related to DP. Labeling Function Development.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Recent progress in DP, or more broadly weak supervision, has largely been made in developing advanced label models [8,11,20,24,25] that denoise and aggregate the weak supervision sources for various applications [4,10,14,27,28,30,34]. We defer readers to [35] for a more comprehensive survey on weak supervision methods, and focus this section on related work that anchors more on the development process of weak supervision sources and the different interactive learning schemes related to DP. Labeling Function Development.…”
Section: Related Workmentioning
confidence: 99%
“…• Under-formalized LF Development Workflow: The lack of formalism on the LF development process has obscured systematic study to optimize the workflow, making it less organized and more challenging for practitioners to design LFs for DP applications [6,12,33,35]. • Inefficient Development Data Selection: Current LF development workflow selects development data with the most straightforward approach, uniform random sampling, which unfortunately can be time-consuming as it oftentimes requires users to inspect a considerable amount of data samples to create an informative set of LFs.…”
Section: Introductionmentioning
confidence: 99%
“…Weak supervision refers to a broad family of techniques that attempts to learn from data that is noisily or less precisely labeled than usual. Our focus is on programmatic weak supervision, in which the sources of supervision are heuristic labelers, often called labeling functions that vote on the true labels of unlabeled examples [65]. Labeling functions can be hand-written programs, models trained for related tasks, or even human annotators if available.…”
Section: Weakly Supervised Machine Learningmentioning
confidence: 99%
“…Limited labeled training data is a major bottleneck in many areas of supervised machine learning. In recent years, the area of programmatic weak supervision [65] has emerged to address this bottleneck. There are a range of techniques, but generally they use multiple noisy heuristic labelers called labeling functions, such as hand-written code and other models, to create training data for new tasks.…”
Section: Introductionmentioning
confidence: 99%
“…As machine learning models become increasingly powerful but also data hungry, new "data-centric" AI development workflows and systems have emerged, wherein the labeling and development of this training data is positioned and supported as the central development activity. One recent and increasingly popular type of data-centric AI development uses Programmatic Weak Supervision (PWS), wherein users focus on developing a diversity of noisy, programmatic supervision sources [32,33,47,45] to programmatically annotate training data in an efficient way. Specifically, these weak supervision sources, e.g., heuristics, knowledge bases, and pre-trained models, are often abstracted as labeling functions (LFs) [33], which is a user-defined program that provides potentially noisy labels for some subset of the data.…”
Section: Introductionmentioning
confidence: 99%