The binary trio at SemEval-2019 Task 5: Multitarget Hate Speech Detection in Tweets

Chiril, Patricia; Zitoune, Farah Benamara; Abhishek, Kumar

doi:10.18653/v1/s19-2087

Cited by 14 publications

(16 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the development of large pre-trained transformer models such as BERT and XLNET (Devlin et al, 2019;Yang et al, 2019), several studies have explored the use of general pre-trained transformers in offensive language identification (Liu et al, 2019;Bucur et al, 2021) as well retrained or fine-tuned models on offensive language corpora such as HateBERT (Caselli et al, 2020). While the vast majority of studies address offensive language identification using English data (Yao et al, 2019;Ridenhour et al, 2020), several recent studies have created new datasets for various languages and applied computational models to identify such content in Arabic (Mubarak et al, 2021), Dutch (Tulkens et al, 2016), French (Chiril et al, 2019), German (Wiegand et al, 2018), Greek (Pitenis et al, 2020), Hindi (Bohra et al, 2018), Italian (Poletto et al, 2017), Portuguese (Fortuna et al, 2019), Slovene (Fišer et al, 2017), Spanish (Plazadel Arco et al, 2021), and Turkish (C ¸öltekin, 2020. A recent trend is the use of pre-trained multilingual models such as XLM-R (Conneau et al, 2019) to leverage available English resources to make predictions in languages with less resources (Plaza-del Arco et al, 2021;Zampieri, 2020, 2021c,b;Sai and Sharma, 2021).…”

Section: Related Workmentioning

confidence: 99%

“…Even though thousands of languages and dialects are widely used in social media, most studies on the automatic identification of such content consider English only, a language for which datasets and other resources such as pre-trained models exist (Rosenthal et al, 2021). In the past few years researchers have studied this problem on languages such as Arabic (Mubarak et al, 2021), French (Chiril et al, 2019), and Turkish (C ¸öltekin, 2020) to name a few. In doing so, they have created new datasets for each of these languages.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

Gaikwad¹,

Ranasinghe²,

Zampieri³

et al. 2021

Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Me

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

Gaikwad¹,

Ranasinghe²,

Zampieri³

et al. 2021

Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Me

View full text Add to dashboard Cite

“…In terms of languages, the majority of studies on this topic deal with English (Malmasi and Zampieri, 2017;Yao et al, 2019;Ridenhour et al, 2020;Rosenthal et al, 2020) due to the the wide availability of language resources such as corpora and pre-trained models. In recent years, several studies have been published on identifying offensive content in other languages such as Arabic (Mubarak et al, 2020), Dutch (Tulkens et al, 2016), French (Chiril et al, 2019), Greek (Pitenis et al, 2020), Italian (Poletto et al, 2017), Portuguese (Fortuna et al, 2019), and Turkish (Çöltekin, 2020). Most of these studies have created new datasets and resources for these languages opening avenues for multilingual models as those presented in Ranasinghe and .…”

Section: Related Workmentioning

confidence: 99%

MUDES: Multilingual Detection of Offensive Spans

Ranasinghe¹,

Zampieri²

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

The interest in offensive content identification in social media has grown substantially in recent years. Previous work has dealt mostly with post level annotations. However, identifying offensive spans is useful in many ways. To help coping with this important challenge, we present MUDES, a multilingual system to detect offensive spans in texts. MUDES features pre-trained models, a Python API for developers, and a user-friendly web-based interface. A detailed description of MUDES' components is presented in this paper.

show abstract

“…The dataset contained a 3-class classification problem (hate-speech, offensive, or neither), a targeted community, as well as the spans that make the text hateful or offensive. Furthermore, offensive language datasets have been annotated in other languages such as Arabic (Mubarak et al, 2017), Danish (Sigurbergsson and Derczynski, 2020), Dutch (Tulkens et al, 2016), French (Chiril et al, 2019), Greek (Pitenis et al, 2020), Portuguese (Fortuna et al, 2019), Spanish (Basile et al, 2019b), andTurkish (Çöltekin, 2020).…”

Section: Related Workmentioning

confidence: 99%

WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans

Ranasinghe¹,

Sarkar²,

Zampieri³

et al. 2021

Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

View full text Add to dashboard Cite

In recent years, the widespread use of social media has led to an increase in the generation of toxic and offensive content on online platforms. In response, social media platforms have worked on developing automatic detection methods and employing human moderators to cope with this deluge of offensive content. While various state-of-the-art statistical models have been applied to detect toxic posts, there are only a few studies that focus on detecting the words or expressions that make a post offensive. This motivates the organization of the SemEval-2021 Task 5: Toxic Spans Detection competition, which has provided participants with a dataset containing toxic spans annotation in English posts. In this paper, we present the WLV-RIT entry for the SemEval-2021 Task 5. Our best performing neural transformer model achieves an 0.68 F1-Score. Furthermore, we develop an open-source framework for multilingual detection of offensive spans, i.e., MUDES, based on neural transformers that detect toxic spans in texts.

show abstract

The binary trio at SemEval-2019 Task 5: Multitarget Hate Speech Detection in Tweets

Cited by 14 publications

References 13 publications

Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

MUDES: Multilingual Detection of Offensive Spans

WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans

Contact Info

Product

Resources

About