Proceedings of the Third Workshop on Abusive Language Online 2019
DOI: 10.18653/v1/w19-3510
|View full text |Cite
|
Sign up to set email alerts
|

A Hierarchically-Labeled Portuguese Hate Speech Dataset

Abstract: Over the past years, the amount of online offensive speech has been growing steadily. To successfully cope with it, machine learning is applied. However, ML-based techniques require sufficiently large annotated datasets. In the last years, different datasets were published, mainly for English. In this paper, we present a new dataset for Portuguese, which has not been in focus so far. The dataset is composed of 5,668 tweets. For its annotation, we defined two different schemes used by annotators with different … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
102
0
2

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 109 publications
(107 citation statements)
references
References 23 publications
3
102
0
2
Order By: Relevance
“…Finally, regarding hate speech recognition amongst tweets in Portuguese, our results for the MLP (micro-averaged F 1 = 0.85) outscored those by [Fortuna et al 2019], who report micro-averaged F 1 = 0.72 with LSTM. At this point, it is worth stressing the fact that both models were run in the same corpus, as mentioned in Section 3, thereby reducing the influence of external variables, such as data source and language.…”
Section: Discussion and Model Comparisonsupporting
confidence: 55%
See 3 more Smart Citations
“…Finally, regarding hate speech recognition amongst tweets in Portuguese, our results for the MLP (micro-averaged F 1 = 0.85) outscored those by [Fortuna et al 2019], who report micro-averaged F 1 = 0.72 with LSTM. At this point, it is worth stressing the fact that both models were run in the same corpus, as mentioned in Section 3, thereby reducing the influence of external variables, such as data source and language.…”
Section: Discussion and Model Comparisonsupporting
confidence: 55%
“…In this work we relied on a dataset of tweets in Portuguese [Fortuna et al 2019], collected through Twitter's API from January to March 2017. To build the dataset, tweets were fetched using specific keywords, and then filtered so as to come from user accounts known to produce hate speech material (i.e.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…This would provide a common framework for researchers who want to investigate either the phenomenon at large or one of its many facets. This direction is explored, for example, in a recent work by Fortuna et al (2019). Another major issue are biases in the design and annotation of corpora.…”
Section: Lexical Analysismentioning
confidence: 99%