Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017 2017
DOI: 10.4000/books.aaccademia.2448
|View full text |Cite
|
Sign up to set email alerts
|

Hate Speech Annotation: Analysis of an Italian Twitter Corpus

Abstract: The paper describes the development of a corpus from social media built with the aim of representing and analysing hate speech against some minority groups in Italy. The issues related to data collection and annotation are introduced, focusing on the challenges we addressed in designing a multifaceted set of labels where the main features of verbal hate expressions may be modelled. Moreover, an analysis of the disagreement among the annotators is presented in order to carry out a preliminary evaluation of the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
55
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 51 publications
(57 citation statements)
references
References 1 publication
2
55
0
Order By: Relevance
“…In this section we extend the preliminary qualitative analysis of the data presented in a previous study on the tag distribution (Poletto et al, 2017). Figure 1 sums up such distribution over the final version of our corpus.…”
Section: Resultsmentioning
confidence: 70%
See 2 more Smart Citations
“…In this section we extend the preliminary qualitative analysis of the data presented in a previous study on the tag distribution (Poletto et al, 2017). Figure 1 sums up such distribution over the final version of our corpus.…”
Section: Resultsmentioning
confidence: 70%
“…We obtained a dataset of 236,193 tweets, from which we randomly selected a subset to be annotated. The detailed description of the entire pipeline of the data collection and annotation can be found in Poletto et al (2017). Given the higher degree of complexity that applying such scheme entailed, we first annotated 1,827 tweets, then we performed another data filtering starting from neutral words that more frequently occur in texts annotated as HS in this first dataset: invadere (invade), invasione (invasion), basta (enough), fuori (out), comunist* (communist*), african* (African), barcon* (migrants boat*).…”
Section: Corpus Creation and Descriptionmentioning
confidence: 99%
See 1 more Smart Citation
“…The data are released after the annotation process, which involved non-trained contributors on the crowdsourcing platform Figure Eight (F8) 5 . The annotation scheme applied to the HatEval data is a simplified merge of schemes already applied in the development of corpora for HS detection and misogyny by the organizers (Fersini et al, 2018a,b;, also in the context of funded projects with focus on the tasks topics 6 Poletto et al, 2017). It includes the following categories:…”
Section: Annotationmentioning
confidence: 99%
“…Several hate speech datasets are publicly available, e.g., for English (Waseem and Hovy, 2016;Davidson et al, 2017;Nobata et al, 2016;Jigsaw, 2018), Spanish (Fersini et al, 2018), Italian (Poletto et al, 2017;Sanguinetti et al, 2018), German (Ross et al, 2016), Hindi (Kumar et al, 2018), and Portuguese (de Pelle and Moreira, 2017). In this section, we analyze the data collection strategy, the annotation method and the dataset properties of three representative hate speech datasets: the Hate speech, Racism and Sexism dataset by Waseem and Hovy (2016), the Offensive Language Dataset by Davidson et al (2017), and the Portuguese News Comments dataset by de Pelle and Moreira (2017).…”
Section: Dataset Annotationmentioning
confidence: 99%