Proceedings of the 2nd Workshop on Abusive Language Online (ALW2) 2018
DOI: 10.18653/v1/w18-5108
|View full text |Cite
|
Sign up to set email alerts
|

Improving Moderation of Online Discussions via Interpretable Neural Models

Abstract: Growing amount of comments make online discussions difficult to moderate by human moderators only. Antisocial behavior is a common occurrence that often discourages other users from participating in discussion. We propose a neural network based method that partially automates the moderation process. It consists of two steps. First, we detect inappropriate comments for moderators to see. Second, we highlight inappropriate parts within these comments to make the moderation faster. We evaluated our method on data… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(5 citation statements)
references
References 21 publications
(12 reference statements)
0
5
0
Order By: Relevance
“…The practical tools for identifying and moderating hate speech, such as profanity filters, content moderation filters, or human driven approaches, have been widely studied [5,10,11,31,32]. In the context of social media platforms, the findings on utilization of practical tools are somewhat puzzling as they either show that social platforms perform too much or too little content moderation and lack transparent decision-making processes [33].…”
Section: Definitions Of Hate Speechmentioning
confidence: 99%
“…The practical tools for identifying and moderating hate speech, such as profanity filters, content moderation filters, or human driven approaches, have been widely studied [5,10,11,31,32]. In the context of social media platforms, the findings on utilization of practical tools are somewhat puzzling as they either show that social platforms perform too much or too little content moderation and lack transparent decision-making processes [33].…”
Section: Definitions Of Hate Speechmentioning
confidence: 99%
“…A few works have specifically pursued the idea of interpretable ML for abuse detection: (Svec et al 2018) shows that an interpretable model can match human-generated annotations with high precision, while (Pavlopoulos, Malakasiotis, and Androutsopoulos 2017) proposes using explanations to help humans make decisions about borderline instances. (Wang 2018) analyzes pitfalls associated with using interpretable ML for abuse detection.…”
Section: Interpretable Machine Learningmentioning
confidence: 99%
“…Moderating classifiers' outcomes is also related to the field of explainable ML, in particular explaining individual classification outcomes (Ribeiro et al, 2016). Studies indicate that explaining relevant words of a class outcome support human annotation tasks by, e.g., reducing the annotation time needed per instance and increasing user trust (Švec et al, 2018;Ribeiro et al, 2016). Our approach is likely to benefit from explaining artificial decisionmaking as well as model uncertainties (Andersen et al, 2020) during the moderation process.…”
Section: Related Workmentioning
confidence: 99%