Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach

Yu, Yue; Zuo, Simiao; Jiang, Haoming; Ren, Wendi; Zhao, Tuo; Zhang, Chao

doi:10.48550/arxiv.2010.07835

Cited by 6 publications

(19 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on the study, future work may concentrate not only on data-centric research but also on the trade-off between performance and robustness considering classification based on small corpora to mitigate the effect of adversarial behavior. Additionally, future research could concentrate on how to fine-tune models with weak supervision: it could be expensive when collecting more data as well as label them all, and more studies have focused on the relevant contents (Yu et al, 2020;Awasthi et al, 2020).…”

Section: Discussionmentioning

confidence: 99%

Speech Detection Task Against Asian Hate: BERT the Central, While Data-Centric Studies the Crucial

Lian¹

2022

Preprint

View full text Add to dashboard Cite

With the epidemic continuing, hatred against Asians is intensifying in countries outside Asia, especially among the Chinese. Thus, there is an urgent need to detect and prevent hate speech towards Asians effectively. In this work, we first create COVID-HATE-2022, an annotated dataset that is an extension of the anti-Asian hate speech dataset on Twitter, including 2,035 annotated tweets fetched in early February 2022, which are labeled based on specific criteria, and we present the comprehensive collection of scenarios of hate and nonhate tweets in the dataset. Second, we finetune the BERT models based on the relevant datasets, and demonstrate strategies including 1) cleaning the hashtags, usernames being @, URLs, and emojis before the fine-tuning process, and 2) training with the data while validating with the "clean" data (and the opposite) are not effective for improving performance. Third, we investigate the performance of advanced fine-tuning strategies with 1) modelcentric approaches, such as discriminative finetuning, gradual unfreezing, and warmup steps, and 2) data-centric approaches, which incorporate data trimming and data augmenting, and show that both strategies generally improve the performance, while data-centric ones outperform the others, which demonstrate the feasibility and effectiveness of the data-centric approaches. 2 The hate crime data for 2020 is available on the FBI's Crime Data Explorer: https: //crime-data-explorer.app.cloud.gov 3 See https://www.who.int/docs/ default-source/coronaviruse/ covid19-stigma-guide.pdf

show abstract

Section: Discussionmentioning

confidence: 99%

Speech Detection Task Against Asian Hate: BERT the Central, While Data-Centric Studies the Crucial

Lian¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Weak Supervision Methods: (1) COSINE (Yu et al, 2020) The COSINE method uses weakly labeled data to fine-tune pre-trained language models by contrastive self-training.…”

Section: Baselinesmentioning

confidence: 99%

“…Weak Supervision. Weak supervision aims to reduce the cost of annotation, and has been widely applied to perform both classification (Ratner et al, 2016b(Ratner et al, , 2019aFu et al, 2020;Yu et al, 2020; and sequence tagging (Lison et al, 2020;Nguyen et al, 2017;Safranchik et al, 2020;Lan et al, 2020) to help reduce human labor required for annotation. Weak supervision builds on many previous approaches in machine learning, such as distant supervision (Mintz et al, 2009;Hoffmann et al, 2011;Takamatsu et al, 2012), crowdsourcing (Gao et al, 2011;Krishna et al, 2016), co-training methods (Blum and Mitchell, 1998), pattern-based supervision (Gupta and Manning, 2014), and feature annotation (Mann and McCallum, 2010;Zaidan and Eisner, 2008).…”

Section: Related Workmentioning

confidence: 99%

“…For COSINE (Yu et al, 2020), we search learning rate from [1 × 10 −5 , 3 × 10 −5 , 1 × 10 −6 , 3 × 10 −6 ], teacher model update frequency from [50,100,200], regularize power scale from [0.01, 0.05, 0.1] and margin threshold γ C from [0.1, 0.3, 0.5, 0.7, 0.9]. For Denoise , we search learning rate from [1 ×…”

Section: B8 Hyperparameters For Weak Supervision Baselinesmentioning

confidence: 99%

“…Owing to this considerable cost, the weak supervision (WS) paradigm has increasingly been used to reduce human efforts (Ratner et al, 2016a;. This approach synthesizes training labels with labeling rules to significantly improve the efficiency of creating training sets and have achieved competitive results in natural language processing (NLP) (Yu et al, 2020;Rühling Cachay et al, 2021). However, existing methods leveraging the WS paradigm to perform NLP tasks mostly focus on reducing the noise in training labels brought by labeling rules, while ignoring the common and critical problem of data imbalance.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

ARS2: Adaptive Ranking-based Sample Selection for Weakly supervised Class-imbalanced Text Classification

Song¹,

Zhang²,

Yang³

et al. 2022

Preprint

View full text Add to dashboard Cite

To obtain a large amount of training labels inexpensively, researchers have recently adopted the weak supervision (WS) paradigm, which leverages labeling rules to synthesize training labels rather than using individual annotations to achieve competitive results for natural language processing (NLP) tasks. However, data imbalance is often overlooked in applying the WS paradigm, despite being a common issue in a variety of NLP tasks. To address this challenge, we propose Adaptive Ranking-based Sample Selection (ARS2), a model-agnostic framework to alleviate the data imbalance issue in the WS paradigm. Specifically, it calculates a probabilistic margin score based on the output of the current model to measure and rank the cleanliness of each data point. Then, the ranked data are sampled based on both class-wise and ruleaware ranking. In particular, the two sample strategies corresponds to our motivations: (1) to train the model with balanced data batches to reduce the data imbalance issue and (2) to exploit the expertise of each labeling rule for collecting clean samples. Experiments on four text classification datasets with four different imbalance ratios show that ARS2 outperformed the state-of-the-art imbalanced learning and WS methods, leading to a 2%-57.8% improvement on their F1-score. Our implementation can be found in https:// github.com/JieyuZ2/wrench/blob/ main/wrench/endmodel/ars2.py.

show abstract

KICE: A Knowledge Consolidation and Expansion Framework for Relation Extraction

Wang

Yang

et al. 2023

AAAI

View full text Add to dashboard Cite

Machine Learning is often challenged by insufficient labeled data. Previous methods employing implicit commonsense knowledge of pre-trained language models (PLMs) or pattern-based symbolic knowledge have achieved great success in mitigating manual annotation efforts. In this paper, we focus on the collaboration among different knowledge sources and present KICE, a Knowledge-evolving framework by Iterative Consolidation and Expansion with the guidance of PLMs and rule-based patterns. Specifically, starting with limited labeled data as seeds, KICE first builds a Rule Generator by prompt-tuning to stimulate the rich knowledge distributed in PLMs, generate seed rules, and initialize the rules set. Afterwards, based on the rule-labeled data, the task model is trained in a self-training pipeline where the knowledge in rules set is consolidated with self-learned high-confidence rules. Finally, for the low-confidence rules, KICE solicits human-enlightened understanding and expands the knowledge coverage for better task model training. Our framework is verified on relation extraction (RE) task, and the experiments on TACRED show that the model performance (F1) grows from 33.24% to 79.84% with the enrichment of knowledge, outperforming all the baselines including other knowledgeable methods.

show abstract

Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach

Cited by 6 publications

References 39 publications

Speech Detection Task Against Asian Hate: BERT the Central, While Data-Centric Studies the Crucial

Speech Detection Task Against Asian Hate: BERT the Central, While Data-Centric Studies the Crucial

ARS2: Adaptive Ranking-based Sample Selection for Weakly supervised Class-imbalanced Text Classification

KICE: A Knowledge Consolidation and Expansion Framework for Relation Extraction

Contact Info

Product

Resources

About