Ruddit: Norms of Offensiveness for English Reddit Comments

Hada, Rishav; Sudhir, Sohi; Mishra, Pushkar; Yannakoudakis, Helen; Mohammad, Saif M.; Shutova, Ekaterina

doi:10.18653/v1/2021.acl-long.210

Cited by 16 publications

(9 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Identifying Toxicity -Most works on identifying toxic language looked at isolated social media posts or comments while ignoring the context (Davidson et al, 2017;Xu et al, 2012;Zampieri et al, 2019;Rosenthal et al, 2020;Kumar et al, 2018;Garibo i Orts, 2019;Ousidhoum et al, 2019;Breitfeller et al, 2019;Hada et al, 2021;Barikeri et al, 2021) train chatbots to avoid sensitive discussions by changing the topic of the conversation. In contrast, we tackle contextual offensive language by fine-tuning models to generate neutral and safe responses in offensive contexts.…”

Section: Related Workmentioning

confidence: 99%

Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

Baheti¹,

Sap²,

Ritter³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Dialogue models trained on human conversations inadvertently learn to generate toxic responses. In addition to producing explicitly offensive utterances, these models can also implicitly insult a group or individual by aligning themselves with an offensive statement. To better understand the dynamics of contextually offensive language, we investigate the stance of dialogue model responses in offensive Reddit conversations. Specifically, we create TOXICHAT, a crowd-annotated dataset of 2,000 Reddit threads and model responses labeled with offensive language and stance. Our analysis reveals that 42% of human responses agree with toxic comments, whereas only 13% agree with safe comments. This undesirable behavior is learned by neural dialogue models, such as DialoGPT, which we show are two times more likely to agree with offensive comments. To enable automatic detection of offensive language, we fine-tuned transformerbased classifiers on TOXICHAT that achieve 0.71 F 1 for offensive labels and 0.53 Macro-F 1 for stance labels. Finally, we quantify the effectiveness of controllable text generation (CTG) methods to mitigate the tendency of neural dialogue models to agree with offensive comments. Compared to the baseline, our best CTG model achieves a 19% reduction in agreement with offensive comments and produces 29% fewer offensive replies. Our work highlights the need for further efforts to characterize and analyze inappropriate behavior in dialogue models, in order to help make them safer. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

Baheti¹,

Sap²,

Ritter³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…There is an abundance of datasets for moderating user-generated content, mostly generated on online social networking sites. Examples of these include Jigsaw (Jigsaw, 2017), Twitter (Zampieri et al, 2019;Basile et al, 2019), Stormfront (de Gibert et al, 2018, Reddit (Hada et al, 2021), Hateful Memes (Kiela et al, 2021). However, the task of guarding LLM-generated content differs from the human-generated content moderation as 1) the style and length of text produced by humans is different from that of LLMs, 2) the type of potential harms encountered in human-generated content are typically limited to hate speech, while LLM moderation requires dealing with a broader range of potential harms 3) guarding LLM-generated involves dealing with prompt-response pairs.…”

Section: Related Workmentioning

confidence: 99%

The Effect of STAS Positivity in Lung Cancer

INAN,

İNAN,

CELIK

et al. 2023

Preprint

View full text Add to dashboard Cite

Aim of study: The spread through air spaces (STAS) phenomenon, which describes the presence of tumor cells in the air spaces of lung cancer, has been associated with an increased risk of local recurrence. We performed retrospective analyses to examine the presence of STAS and to evaluate its clinical results and its relationship with clinicopathological parameters. Materials and Methods: A total of 149 surgically resected lung cancer cases were analyzed retrospectively. Detailed analyses were performed on demographic- radiological-clinical-histological features. Results: The mean age of the patients was 63 (IQR = 11; range, 22–81), among whom 31 were female and 118 were male. The incidence of STAS was not different between the histological groups (p = 0.427). There was no difference between SUVmax value in STAS-positive and negative patients (p = 0.970). The recurrence rate, survival, and median tumor size were not different from each other in the STAS-positive and STAS-negative groups (p = 1,000, p = 0.086, p = 0.292, respectively). Conclusion: STAS is an independent risk factor for poor prognosis. Therefore, it may be possible to provide more personalized information by using clinicopathological markers that will facilitate preoperative prediction of STAS presence.

show abstract

“…Since Reddit involved additional data collection (a time consuming process), we chose a popular dataset that contains less than 10,000 datapoints. Annotated hate speech data: We use the following english hate speech datasets for our experiments (See Table 1 for more information on dataset statistics) -(i) HateXplain-GAB dataset (Mathew et al, 2021) (contains data from GAB), (ii) LTI-GAB dataset (Qian et al, 2019) (contains data from GAB) and, (iii) Ruddit (Hada et al, 2021) (contains data from Reddit).…”

Section: Datasetsmentioning

confidence: 99%

CRUSH: Contextually Regularized and User anchored Self-supervised Hate speech Detection

Dutta¹,

Chakraborty²,

Roychowdhury³

et al. 2022

Preprint

View full text Add to dashboard Cite

The last decade has witnessed a surge in the interaction of people through social networking platforms. While there are several positive aspects of these social platforms, their proliferation has led them to become the breeding ground for cyber-bullying and hate speech. Recent advances in NLP have often been used to mitigate the spread of such hateful content. Since the task of hate speech detection is usually applicable in the context of social networks, we introduce CRUSH, a framework for hate speech detection using User Anchored self-supervision and contextual regularization. Our proposed approach secures ≈ 1-12% improvement in test set metrics over best performing previous approaches on two types of tasks and multiple popular English language social networking datasets. Note: This paper contains materials that may be offensive or upsetting to some people.

show abstract

Ruddit: Norms of Offensiveness for English Reddit Comments

Cited by 16 publications

References 47 publications

Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

The Effect of STAS Positivity in Lung Cancer

CRUSH: Contextually Regularized and User anchored Self-supervised Hate speech Detection

Contact Info

Product

Resources

About