Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations

Emmery, Chris; Kádár, Ákos; Chrupała, Grzegorz; Daelemans, Walter

doi:10.48550/arxiv.2201.06384

Cited by 2 publications

(3 citation statements)

References 43 publications

(59 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to recent research about the effect of adversarial behavior and augmentation for cyberbullying detection (or in general, toxic comments) regarding small corpora, the less training data is available, the more vulnerable models might become (Emmery et al, 2022). Furthermore, it is demonstrated that model-agnostic lexical substitutions, or in general, perturbations, significantly hurt classifier performance, and when the perturbed samples are used for augmentation, models become robust against word-level perturbations at a slight trade-off in overall task performance (Emmery et al, 2022). Therefore, data trimming and data augmentation strategies could be used to achieve the right balance between performance and robustness.…”

Section: Discussionmentioning

confidence: 99%

Speech Detection Task Against Asian Hate: BERT the Central, While Data-Centric Studies the Crucial

Lian¹

2022

Preprint

View full text Add to dashboard Cite

With the epidemic continuing, hatred against Asians is intensifying in countries outside Asia, especially among the Chinese. Thus, there is an urgent need to detect and prevent hate speech towards Asians effectively. In this work, we first create COVID-HATE-2022, an annotated dataset that is an extension of the anti-Asian hate speech dataset on Twitter, including 2,035 annotated tweets fetched in early February 2022, which are labeled based on specific criteria, and we present the comprehensive collection of scenarios of hate and nonhate tweets in the dataset. Second, we finetune the BERT models based on the relevant datasets, and demonstrate strategies including 1) cleaning the hashtags, usernames being @, URLs, and emojis before the fine-tuning process, and 2) training with the data while validating with the "clean" data (and the opposite) are not effective for improving performance. Third, we investigate the performance of advanced fine-tuning strategies with 1) modelcentric approaches, such as discriminative finetuning, gradual unfreezing, and warmup steps, and 2) data-centric approaches, which incorporate data trimming and data augmenting, and show that both strategies generally improve the performance, while data-centric ones outperform the others, which demonstrate the feasibility and effectiveness of the data-centric approaches. 2 The hate crime data for 2020 is available on the FBI's Crime Data Explorer: https: //crime-data-explorer.app.cloud.gov 3 See https://www.who.int/docs/ default-source/coronaviruse/ covid19-stigma-guide.pdf

show abstract

Section: Discussionmentioning

confidence: 99%

Speech Detection Task Against Asian Hate: BERT the Central, While Data-Centric Studies the Crucial

Lian¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Counterfactual data has been shown to be helpful in numerous social biases contexts Zmigrod et al, 2019;Hall Maudslay et al, 2019;Dinan et al, 2020a,b;Webster et al, 2020;Renduchintala and Williams, 2022;Smith and Williams, 2021;Emmery et al, 2022). Heuristic demographic perturbation of entities has even been used as a way of learning about models' sentiment for demographic groups (Huang et al, 2020).…”

Section: Measuring Fairness With the Fairscorementioning

confidence: 99%

“…Moreover, perturbation approaches enable a better understanding of dataset contents (cf. Hutchinson et al 2021), improve generalization (Ross et al, 2022), and have been shown to help in numerous social biases contexts (Hall Maudslay et al, 2019;Zmigrod et al, 2019;Dinan et al, 2020a,b;Webster et al, 2020;Renduchintala and Williams, 2022;Smith and Williams, 2021;Emmery et al, 2022). However, creating high quality perturbations is challenging because heuristics tend to be error prone (Section 2.4), and there is currently no large scale annotated data to train neural models on (c.f.…”

Section: Introductionmentioning

confidence: 99%

Perturbation Augmentation for Fairer NLP

Qian¹,

Ross²,

Fernandes³

et al. 2022

Preprint

View full text Add to dashboard Cite

Unwanted and often harmful social biases are becoming ever more salient in NLP research, affecting both models and datasets. In this work, we ask: does training on demographically perturbed data lead to more fair language models? We collect a large dataset of human annotated text perturbations and train an automatic perturber on it, which we show to outperform heuristic alternatives. We find: (i) Language models (LMs) pre-trained on demographically perturbed corpora are more fair, at least, according to our current best metrics for measuring model fairness, and (ii) LMs finetuned on perturbed GLUE datasets exhibit less demographic bias on downstream tasks. We find that improved fairness does not come at the expense of accuracy. Although our findings appear promising, there are still some limitations, as well as outstanding questions about how best to evaluate the (un)fairness of large language models. We hope that this initial exploration of neural demographic perturbation will help drive more improvement towards fairer NLP.

show abstract

Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations

Cited by 2 publications

References 43 publications

Speech Detection Task Against Asian Hate: BERT the Central, While Data-Centric Studies the Crucial

Speech Detection Task Against Asian Hate: BERT the Central, While Data-Centric Studies the Crucial

Perturbation Augmentation for Fairer NLP

Contact Info

Product

Resources

About