Multi-Class Detection of Abusive Language Using Automated Machine Learning

Jorgensen, Mackenzie; Choi, Moon Gun; Niemann, Marco; Brunk, Jens; Becker, Jörg

doi:10.30844/wi_2020_r7-jorgensen

Cited by 3 publications

(1 citation statement)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finding the optimal ML configuration for a problem usually involves a repetitive and time-consuming process of testing different models, hyperparameters, preprocessing techniques, and feature engineering strategies. The goal of Auto-ML is to automate much of this workflow and reduce the developer's bias towards prioritizing specific models or configurations over others [48,49].…”

Section: Auto-ml Setupmentioning

confidence: 99%

No Time Like the Present: Effects of Language Change on Automated Comment Moderation

Justen¹,

Müller²,

Niemann³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

The spread of online hate has become a major problem for newspapers that host comment sections. As a result, there is growing interest in using machine learning (ML) and natural language processing (NLP) for (semi-) automated abusive language detection to avoid manual comment moderation costs or having to shut down comment sections all together. However, much of the past work on abusive language detection with ML uses random train-test splitting procedures that assume an unrealistically static language environment. In this paper, we show using a new German newspaper comments dataset that a time-stratified evaluation procedure provides a more realistic measure of a classifier's performance on future data. We also show that the performance of classifiers can degrade quickly as the training data grows more outdated and language and news coverage evolve. Further, we demonstrate that the performance of classifiers trained on data from before the COVID-19 pandemic drops sharply when evaluated on COVID-era comments. Our findings suggest that when standard ML techniques are applied naively to abusive language detection, a classifier will fail to meet the advertised evaluation benchmarks in the real-world environment.

show abstract

Section: Auto-ml Setupmentioning

confidence: 99%