Toxicity Detection: Does Context Really Matter?

Pavlopoulos, John; Sorensen, Jeffrey; Dixon, Lucas; Thain, Nithum; Androutsopoulos, Ion

doi:10.18653/v1/2020.acl-main.396

Cited by 60 publications

(56 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, these methodologies have only recently been explored for toxicity detection [33], although the need to monitor online communications to identify toxicity and make the communications safe and respectful is an old and still open issue. Hence, the gap between the current methodologies and their potential use within toxicity detection remains an open challenge.…”

Section: Related Workmentioning

confidence: 99%

An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments

2021

View full text Add to dashboard Cite

Today, increasing numbers of people are interacting online and a lot of textual comments are being produced due to the explosion of online communication. However, a paramount inconvenience within online environments is that comments that are shared within digital platforms can hide hazards, such as fake news, insults, harassment, and, more in general, comments that may hurt someone’s feelings. In this scenario, the detection of this kind of toxicity has an important role to moderate online communication. Deep learning technologies have recently delivered impressive performance within Natural Language Processing applications encompassing Sentiment Analysis and emotion detection across numerous datasets. Such models do not need any pre-defined hand-picked features, but they learn sophisticated features from the input datasets by themselves. In such a domain, word embeddings have been widely used as a way of representing words in Sentiment Analysis tasks, proving to be very effective. Therefore, in this paper, we investigated the use of deep learning and word embeddings to detect six different types of toxicity within online comments. In doing so, the most suitable deep learning layers and state-of-the-art word embeddings for identifying toxicity are evaluated. The results suggest that Long-Short Term Memory layers in combination with mimicked word embeddings are a good choice for this task.

show abstract

Section: Related Workmentioning

confidence: 99%

An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments

2021

View full text Add to dashboard Cite

show abstract

“…Recent research on the helpfulness of context may also support our view to restrict the context for training data. In an in-depth study, Pavlopoulos et al (2020) found that increasing the context for abusive language detection by considering microposts neighbouring the post to be classified actually harms classification performance. Microposts, such as tweets from Twitter, themselves can already be fairly long (up to 280 characters) representing a paragraph of sentences.…”

Section: Classification Below the Micropost-levelmentioning

confidence: 99%

Implicitly Abusive Language – What does it actually look like and why are we not getting there?

Wiegand¹,

Ruppenhofer²,

Eder³

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Abusive language detection is an emerging field in natural language processing which has received a large amount of attention recently. Still the success of automatic detection is limited. Particularly, the detection of implicitly abusive language, i.e. abusive language that is not conveyed by abusive words (e.g. dumbass or scum), is not working well. In this position paper, we explain why existing datasets make learning implicit abuse difficult and what needs to be changed in the design of such datasets. Arguing for a divide-and-conquer strategy, we present a list of subtypes of implicitly abusive language and formulate research tasks and questions for future research.

show abstract

“…In this way, we chose to limit our final dataset to comparisons that can be classified in isolation. The motivation for this is that, while humans perceive the same texts as more or less offensive given con-text, modeling further context of abusive utterances was not found to improve classification using currently available methods, as shown by the recent in-depth study by Pavlopoulos et al (2020).…”

Section: Creating the Datasetmentioning

confidence: 99%

Implicitly Abusive Comparisons – A New Dataset and Linguistic Analysis

Wiegand

Geulig²,

Ruppenhofer

2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

We examine the task of detecting implicitly abusive comparisons (e.g. Your hair looks like you have been electrocuted). Implicitly abusive comparisons are abusive comparisons in which abusive words (e.g. dumbass or scum) are absent. We detail the process of creating a novel dataset for this task via crowdsourcing that includes several measures to obtain a sufficiently representative and unbiased set of comparisons. We also present classification experiments that include a range of linguistic features that help us better understand the mechanisms underlying abusive comparisons.

show abstract

Toxicity Detection: Does Context Really Matter?

Cited by 60 publications

References 31 publications

An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments

An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments

Implicitly Abusive Language – What does it actually look like and why are we not getting there?

Implicitly Abusive Comparisons – A New Dataset and Linguistic Analysis

Contact Info

Product

Resources

About