Overview of the EVALITA 2018 Hate Speech Detection Task

Bosco, Cristina; Dell’Orletta⋄, Felice; Poletto, Fabio; Sanguinetti, Manuela; Tesconi, Maurizio

doi:10.4000/books.aaccademia.4503

Cited by 95 publications

(87 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…sono zavorre e tutti uomini (refugees? They are deadweights and all men) Source: (Bosco et al 2018 Warner and Hirschberg (2012) Use of a sexist or racial slur, attack a minority, promotes hate speech or violent crime, blatantly misrepresents truth, shows support of problematic hashtags, defends xenophobia or sexism, or contains a screen name that is offensive Waseem and Hovy (2016) Act of offending, insulting or threatening a person or a group of similar people on the basis of religion, race, caste, sexual orientation, gender or belongingness to a specific stereotyped community Schmidt and Wiegand (2017) Language that is used to express hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult the members of the group Davidson et al (2017) Any communication that disparages a target group of people based on some characteristic such as race, colour, ethnicity, gender, sexual orientation, nationality, religion, or other characteristic Nockleby (2000) Aggressiveness Intention to be aggressive, harmful, or even to incite, in various forms, to violent acts against a given target Sanguinetti et al (2018) Offensiveness Any form of non-acceptable language (profanity) or a targeted offense, which can be veiled or direct Zampieri et al (2019a) Profanity, strongly impolite, rude or vulgar language expressed with fighting or hurtful words in order to insult a targeted individual or group Fortuna and Nunes (2018) Abusiveness/ toxicity Hurtful language, including hate speech, derogatory language and also profanity Founta et al (2018) Any strongly impolite, rude or hurtful language using profanity, that can show a debasement of someone or something, or show intense emotion Fortuna and Nunes (2018) Extremely offensive and insulting; engaging in or characterized by habitual violence and cruelty Oxford English Dictionary (2019)…”

Section: Inclusion and Exclusion Criteriamentioning

confidence: 99%

“…A second key distinction concerns the source from which data are retrieved. The microblogging platform Twitter 11 is by far the most exploited source, due to the relatively reduced length of texts and to a friendly policy on making data publicly available: 32 resources contain tweets, one of which (Olteanu et al 2018) also features posts from the social aggregator Reddit 12 , one (Nascimento et al 2019) also retrieves comments from the 55chan 13 imageboard, while in two works (Bosco et al 2018;Mandl et al 2019 2018use sentences from the well-known white-suprematist forum Stormfront; the dataset released for the Hate Speech Hackathon 15 contains posts from the Wikipedia Topical focus: Abusiveness (5); Aggressiveness (2); Anti-Roma (1); Child sexual abuse (1); Cyberbullying (2); Flames (1); Harassment (1); Homophobia (4); HS (36); Islamophobia (2); Obscenity, Profanity (3); Offensiveness (13); Personal Attacks (1); Racism (6); Sexism, Misogyny (9); Threats, Violence (1); Toxicity (1); White supremacy (1). Nearly all the resources feature user-generated public contents, mostly microblog posts, often retrieved with a keyword-based approach and mostly using words with a negative polarity.…”

Section: Data Sourcementioning

confidence: 99%

“…In all instances, the original data was collected from social media (Twitter and Facebook), and annotated manually by experts but integrating in two cases crowdsourced annotations. The tasks, with their main focus, are summarized in Table 7. HS (against multiple targets) is the main topic in HaSpeeDe (Bosco et al 2018), one of the tasks organized at EVALITA 2018; while, more specifically, HS against women is addressed to in the two editions of AMI (Fersini et al 2018a, b) and in HatEval (Basile et al 2019) (which, in turn, included data also on HS against immigrants), and a focus on cyberbullying is proposed in Task 6 at PolEval (Ptaszynski et al 2019).…”

Section: Shared Tasksmentioning

confidence: 99%

See 2 more Smart Citations

Resources and benchmark corpora for hate speech detection: a systematic review

Poletto

Basile

Sanguinetti

et al. 2020

Lang Resources & Evaluation

Self Cite

215

201

View full text Add to dashboard Cite

Hate Speech in social media is a complex phenomenon, whose detection has recently gained significant traction in the Natural Language Processing community, as attested by several recent review works. Annotated corpora and benchmarks are key resources, considering the vast number of supervised approaches that have been proposed. Lexica play an important role as well for the development of hate speech detection systems. In this review, we systematically analyze the resources made available by the community at large, including their development methodology, topical focus, language coverage, and other factors. The results of our analysis highlight a heterogeneous, growing landscape, marked by several issues and venues for improvement.

show abstract

Section: Inclusion and Exclusion Criteriamentioning

confidence: 99%

Section: Data Sourcementioning

confidence: 99%

Section: Shared Tasksmentioning

confidence: 99%

See 1 more Smart Citation

Resources and benchmark corpora for hate speech detection: a systematic review

Poletto

Basile

Sanguinetti

et al. 2020

Lang Resources & Evaluation

Self Cite

215

201

View full text Add to dashboard Cite

show abstract

“…The different models and features presented in the literature are difficult to compare effectively because the results are evaluated on individual datasets that are often not public, hence the survey advocates for broader availability of publicly available data. This evaluation gap is being bridged recently by evaluation campaigns for English, Spanish (SemEval [10]), German [11], and Italian (EVALITA [12]), whose shared tasks released annotated datasets for hate speech detection. The availability of benchmarks for system evaluation and datasets for hate speech detection in different languages made the challenge of investigating architectures, which are also stable and well-performing across different languages, an exciting issue to research [13,14].…”

Section: Related Workmentioning

confidence: 99%

“…This result is confirmed in [23], where AlBERTo is applied to hate speech detection on Italian social media. We trained AlBERTo on data that also encompassed the train and reference set from Haspeede [12], the first shared task on hate speech on Italian organized within EVALITA2018 evaluation campaign (http://www.di.unito.it/~tutreeb/ haspeede-evalita18/index.html).…”

Section: Related Workmentioning

confidence: 99%

Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media

et al. 2020

View full text Add to dashboard Cite

The availability of large annotated corpora from social media and the development of powerful classification approaches have contributed in an unprecedented way to tackle the challenge of monitoring users’ opinions and sentiments in online social platforms across time. Such linguistic data are strongly affected by events and topic discourse, and this aspect is crucial when detecting phenomena such as hate speech, especially from a diachronic perspective. We address this challenge by focusing on a real case study: the “Contro l’odio” platform for monitoring hate speech against immigrants in the Italian Twittersphere. We explored the temporal robustness of a BERT model for Italian (AlBERTo), the current benchmark on non-diachronic detection settings. We tested different training strategies to evaluate how the classification performance is affected by adding more data temporally distant from the test set and hence potentially different in terms of topic and language use. Our analysis points out the limits that a supervised classification model encounters on data that are heavily influenced by events. Our results show how AlBERTo is highly sensitive to the temporal distance of the fine-tuning set. However, with an adequate time window, the performance increases, while requiring less annotated data than a traditional classifier.

show abstract