Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.84
|View full text |Cite
|
Sign up to set email alerts
|

Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach

Abstract: Fine-tuned pre-trained language models (LMs) have achieved enormous success in many natural language processing (NLP) tasks, but they still require excessive labeled data in the finetuning stage. We study the problem of finetuning pre-trained LMs using only weak supervision, without any labeled data. This problem is challenging because the high capacity of LMs makes them prone to overfitting the noisy labels generated by weak supervision. To address this problem, we develop a contrastive self-training framewor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
38
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 54 publications
(46 citation statements)
references
References 33 publications
0
38
0
Order By: Relevance
“…Our work is also relevant to semi-supervised learning, where the training data is only partially labeled. There have been many semisupervised learning methods, including the popular self-training methods used in our experiments for comparison (Yarowsky, 1995;Rosenberg et al, 2005;Tarvainen and Valpola, 2017;Miyato et al, 2018;Meng et al, 2018;Clark et al, 2018;Yu et al, 2021). Different from weak supervision, these semi-supervised learning methods usually has a partial set of labeled data.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Our work is also relevant to semi-supervised learning, where the training data is only partially labeled. There have been many semisupervised learning methods, including the popular self-training methods used in our experiments for comparison (Yarowsky, 1995;Rosenberg et al, 2005;Tarvainen and Valpola, 2017;Miyato et al, 2018;Meng et al, 2018;Clark et al, 2018;Yu et al, 2021). Different from weak supervision, these semi-supervised learning methods usually has a partial set of labeled data.…”
Section: Discussionmentioning
confidence: 99%
“…One potential future direction is to combine NEE-DLE with other fully weakly supervised / semisupervised learning techniques to further improve the performance, e.g., contrastive regularization (Yu et al, 2021).…”
Section: Discussionmentioning
confidence: 99%
“…Specifically, the work achieved three main innovations: Firstly, we refined the architecture of Cellpose through introducing attention mechanisms and hierarchical information, making the model more sensitive to different styles. Secondly, we implemented a contrastive fine-tuning strategy to leverage the information from both unlabelled and pre-trained data based on contrastive learning, which have also achieved great success in other deep learning applications [28][29][30][31][32] . Finally, we organized three benchmarking datasets containing three levels of cell images for further use in segmentation algorithm development.…”
Section: Discussionmentioning
confidence: 99%
“…To reduce the efforts of annotation, recent weak supervision (WS) frameworks have been proposed which focus on enabling users to leverage a diversity of weaker, often programmatic supervision sources [76,77,75] to label and manage training data in an efficient way. Recently, WS has been widely applied to various machine learning tasks in a diversity of domains: scene graph prediction [9], video analysis [23,92], image classification [12], image segmentation [35], autonomous driving [96], relation extraction [36,107,57], named entity recognition [82,53,50,45,27], text classification [78,100,85,86], dialogue system [63], biomedical [43,19,64], healthcare [20,17,21,80,93,81], software engineering [74], sensors data [24,39], E-commerce [66,103], and multi-agent systems [102].…”
Section: Introductionmentioning
confidence: 99%
“…These two-stage methods mainly focus on the efficiency and effectiveness of the label model, while maintaining the maximal flexibility of the end model. Recent approaches have also focused on integrating semi-or self-supervised approaches [100]; we view these as modified end models in our benchmarking framework. In addition to these two-stage methods, researchers have also explored the possibility of coupling the label model and the end model in an end-to-end manner [78,45,38].…”
Section: Introductionmentioning
confidence: 99%