2022
DOI: 10.48550/arxiv.2201.06384
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations

Abstract: A limited amount of studies investigates the role of model-agnostic adversarial behavior in toxic content classification. As toxicity classifiers predominantly rely on lexical cues, (deliberately) creative and evolving language-use can be detrimental to the utility of current corpora and state-of-the-art models when they are deployed for content moderation. The less training data is available, the more vulnerable models might become. This study is, to our knowledge, the first to investigate the effect of adver… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 43 publications
(59 reference statements)
0
3
0
Order By: Relevance
“…According to recent research about the effect of adversarial behavior and augmentation for cyberbullying detection (or in general, toxic comments) regarding small corpora, the less training data is available, the more vulnerable models might become (Emmery et al, 2022). Furthermore, it is demonstrated that model-agnostic lexical substitutions, or in general, perturbations, significantly hurt classifier performance, and when the perturbed samples are used for augmentation, models become robust against word-level perturbations at a slight trade-off in overall task performance (Emmery et al, 2022). Therefore, data trimming and data augmentation strategies could be used to achieve the right balance between performance and robustness.…”
Section: Discussionmentioning
confidence: 99%
“…According to recent research about the effect of adversarial behavior and augmentation for cyberbullying detection (or in general, toxic comments) regarding small corpora, the less training data is available, the more vulnerable models might become (Emmery et al, 2022). Furthermore, it is demonstrated that model-agnostic lexical substitutions, or in general, perturbations, significantly hurt classifier performance, and when the perturbed samples are used for augmentation, models become robust against word-level perturbations at a slight trade-off in overall task performance (Emmery et al, 2022). Therefore, data trimming and data augmentation strategies could be used to achieve the right balance between performance and robustness.…”
Section: Discussionmentioning
confidence: 99%
“…Counterfactual data has been shown to be helpful in numerous social biases contexts Zmigrod et al, 2019;Hall Maudslay et al, 2019;Dinan et al, 2020a,b;Webster et al, 2020;Renduchintala and Williams, 2022;Smith and Williams, 2021;Emmery et al, 2022). Heuristic demographic perturbation of entities has even been used as a way of learning about models' sentiment for demographic groups (Huang et al, 2020).…”
Section: Measuring Fairness With the Fairscorementioning
confidence: 99%
“…Moreover, perturbation approaches enable a better understanding of dataset contents (cf. Hutchinson et al 2021), improve generalization (Ross et al, 2022), and have been shown to help in numerous social biases contexts (Hall Maudslay et al, 2019;Zmigrod et al, 2019;Dinan et al, 2020a,b;Webster et al, 2020;Renduchintala and Williams, 2022;Smith and Williams, 2021;Emmery et al, 2022). However, creating high quality perturbations is challenging because heuristics tend to be error prone (Section 2.4), and there is currently no large scale annotated data to train neural models on (c.f.…”
Section: Introductionmentioning
confidence: 99%