Proceedings of the Third Workshop on Abusive Language Online 2019
DOI: 10.18653/v1/w19-3506
|View full text |Cite
|
Sign up to set email alerts
|

Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter

Abstract: Hate speech and abusive language spreading on social media need to be detected automatically to avoid conflicts between citizens. Moreover, hate speech has a target, category, and level that also need to be detected to help the authority in prioritizing which hate speech must be addressed immediately. This research discusses multi-label text classification for abusive language and hate speech detection including detecting the target, category, and level of hate speech in Indonesian Twitter using machine learni… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
85
0
16

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 135 publications
(107 citation statements)
references
References 24 publications
0
85
0
16
Order By: Relevance
“…We also plan to add contact information and instructions for datasets that are not publicly accessible but available only on request, such as the datasets by Golbeck et al (2017), Rezvan et al (2018), and Tulkens et al (2016. Kumar et al (2018) 11.6k Facebook hing aggressive 21 Mathur et al (2018) 3.2k Twitter en,hi abuse,hate 22 Sanguinetti et al (2018) 6.9k Twitter it five classes b 23 Wiegand et al (2018) 8.5k Twitter de abuse,insult,profanity 24 Basile et al (2019) 19.6k Twitter en,es aggression,hate,target 25 Chung et al (2019) 15.0k misc en,fr,it hate,counter-narrative 26 Fortuna et al (2019) 5.7k Twitter pt hate,target 27 Ibrohim and Budi (2019) 13.2k Twitter id abuse,strong/weak hate,target 28 Mandl et al (2019) 6.0k Twitter hi hate,offense,profanity,target 29 Mandl et al (2019) 4.7k Twitter de hate,offense,profanity,target 30 Mandl et al (2019) 7.0k Twitter en hate,offense,profanity,target 31 Mulki et al (2019) 5.8k Twitter ar abuse,hate 32 Ousidhoum et al (2019) 5.6k Twitter fr abuse,hate,offense,target 33 Ousidhoum et al (2019) 5.…”
Section: Discussionmentioning
confidence: 99%
“…We also plan to add contact information and instructions for datasets that are not publicly accessible but available only on request, such as the datasets by Golbeck et al (2017), Rezvan et al (2018), and Tulkens et al (2016. Kumar et al (2018) 11.6k Facebook hing aggressive 21 Mathur et al (2018) 3.2k Twitter en,hi abuse,hate 22 Sanguinetti et al (2018) 6.9k Twitter it five classes b 23 Wiegand et al (2018) 8.5k Twitter de abuse,insult,profanity 24 Basile et al (2019) 19.6k Twitter en,es aggression,hate,target 25 Chung et al (2019) 15.0k misc en,fr,it hate,counter-narrative 26 Fortuna et al (2019) 5.7k Twitter pt hate,target 27 Ibrohim and Budi (2019) 13.2k Twitter id abuse,strong/weak hate,target 28 Mandl et al (2019) 6.0k Twitter hi hate,offense,profanity,target 29 Mandl et al (2019) 4.7k Twitter de hate,offense,profanity,target 30 Mandl et al (2019) 7.0k Twitter en hate,offense,profanity,target 31 Mulki et al (2019) 5.8k Twitter ar abuse,hate 32 Ousidhoum et al (2019) 5.6k Twitter fr abuse,hate,offense,target 33 Ousidhoum et al (2019) 5.…”
Section: Discussionmentioning
confidence: 99%
“…There are many other languages for which this research needs to be carried upon. This is why our experiment will be based on Indonesian language tweets made publicly available by Ibrohim et al [29]. The dataset contains 13,169 tweets that consist of 7,608 not hate speech and 5,561 hate speech and will be split to train-test-validate of 60%-20%-20%.…”
Section: Discussionmentioning
confidence: 99%
“…Size Source Lang. Classes 1 Bretschneider and Peters (2016) 1.8k Forum en offense 2 Bretschneider and Peters (2016) 1.2k Forum en offense 3 16.9k Twitter en racism,sexism 4 0.7k Twitter id hate 5 0.5k Twitter de hate 6 Bretschneider and Peters (2017) 5.8k Facebook de strong/weak offense,target 7 25.0k Twitter en hate,offense 8 Gao and Huang (2017) 1.5k news en hate 9 Jha and Mamidi (2017) 10.0k Twitter en benevolent/hostile sexism 10 Mubarak et al (2017) 31.7k news ar obscene,offensive 11 1.1k Twitter ar obscene,offensive 12 115.9k Wikipedia en attack 13 115.9k Wikipedia en aggressive 14 160.0k Wikipedia en toxic 15 Albadi et al (2018) 6.1k Twitter ar hate 16 ElSherief et al (2018) 28.0k Twitter en hate,target 17 80.0k Twitter en six classes d 18 de 10.6k Forum en hate 19 2.0k Twitter id abuse,offense 20 11.6k Facebook hing aggressive 21 Mathur et al (2018) 3.2k Twitter en,hi abuse,hate 22 6.9k Twitter it five classes b 23 8.5k Twitter de abuse,insult,profanity 24 19.6k Twitter en,es aggression,hate,target 25 15.0k misc en,fr,it hate,counter-narrative 26 Fortuna et al (2019) 5.7k Twitter pt hate,target 27 Ibrohim and Budi (2019) 13.2k Twitter id abuse,strong/weak hate,target 28 Mandl et al (2019) 6.0k Twitter hi hate,offense,profanity,target 29 Mandl et al (2019) 4.7k Twitter de hate,offense,profanity,target 30 Mandl et al (2019) 7.0k Twitter en hate,offense,profanity,target 31 Mulki et al (2019) 5.8k Twitter ar abuse,hate 32 5.6k Twitter fr abuse,hate,offense,target 33 5.6k Twitter en abuse,hate,offense,target 34 4.0k Twitter en abuse,hate,offense,target 35 3.3k Twitter ar abuse,hate,offense,target 36 22.3k Forum en hate 37 33.8k Forum en hate 38 13.2k Twitter en offense 39 Çöltekin (2020) 36.0k Twitter tr offense,target 40 Pitenis et al (2020) 4.8k Twitter el offense 41 Sigurbergsson and Derczynski (2020) Community-level bans are a common tool against groups that enable online harassment and harmful speech. Unfortunately, the efficacy of community bans has only been partially studied and with mixed results.…”
Section: Id Studymentioning
confidence: 99%
“…The development of systems for the automatic identification of abusive language phenomena has followed a common trend in NLP: feature-based linear classifiers Ribeiro et al, 2018;Ibrohim and Budi, 2019), neural network architectures (e.g., CNN or Bi-LSTM) (Kshirsagar et al, 2018;Mishra et al, 2018;Mitrović et al, 2019;Sigurbergsson and Derczynski, 2020), and fine-tuning pre-trained language models, e.g., BERT, RoBERTa, a.o., Swamy et al, 2019). Results vary both across datasets and architectures, with linear classifiers qualifying as very competitive, if not better, when compared to neural networks.…”
Section: Introductionmentioning
confidence: 99%