Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter

Ibrohim, Muhammad Okky; Budi, Indra

doi:10.18653/v1/w19-3506

Cited by 135 publications

(107 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We also plan to add contact information and instructions for datasets that are not publicly accessible but available only on request, such as the datasets by Golbeck et al (2017), Rezvan et al (2018), and Tulkens et al (2016. Kumar et al (2018) 11.6k Facebook hing aggressive 21 Mathur et al (2018) 3.2k Twitter en,hi abuse,hate 22 Sanguinetti et al (2018) 6.9k Twitter it five classes b 23 Wiegand et al (2018) 8.5k Twitter de abuse,insult,profanity 24 Basile et al (2019) 19.6k Twitter en,es aggression,hate,target 25 Chung et al (2019) 15.0k misc en,fr,it hate,counter-narrative 26 Fortuna et al (2019) 5.7k Twitter pt hate,target 27 Ibrohim and Budi (2019) 13.2k Twitter id abuse,strong/weak hate,target 28 Mandl et al (2019) 6.0k Twitter hi hate,offense,profanity,target 29 Mandl et al (2019) 4.7k Twitter de hate,offense,profanity,target 30 Mandl et al (2019) 7.0k Twitter en hate,offense,profanity,target 31 Mulki et al (2019) 5.8k Twitter ar abuse,hate 32 Ousidhoum et al (2019) 5.6k Twitter fr abuse,hate,offense,target 33 Ousidhoum et al (2019) 5.…”

Section: Discussionmentioning

confidence: 99%

Toxic Comment Collection: Making More Than 30 Datasets Easily Accessible in One Unified Format

Risch¹,

Schmidt²,

Krestel³

2021

Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

Toxic Comment Collection: Making More Than 30 Datasets Easily Accessible in One Unified Format

Risch¹,

Schmidt²,

Krestel³

2021

Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

View full text Add to dashboard Cite

“…There are many other languages for which this research needs to be carried upon. This is why our experiment will be based on Indonesian language tweets made publicly available by Ibrohim et al [29]. The dataset contains 13,169 tweets that consist of 7,608 not hate speech and 5,561 hate speech and will be split to train-test-validate of 60%-20%-20%.…”

Section: Discussionmentioning

confidence: 99%

A Systematic Literature Review of Different Machine Learning Methods on Hate Speech Detection

Salim

Suhartono

2020

JOIV : Int. J. Inform. Visualization

View full text Add to dashboard Cite

Hate speech is one of the most challenging problem internet is facing today. This systematic literature review examine hate speech detection problem and will be used to do an experimental approach on detecting hate speech and abusive language. This work also provide an overview of previous research, including methods, algorithms, and main features used. We use two research questions in this literature review which will be the foundation of the next experimental research. Correctly classifying a piece of text as an actual hate speech requires a lot of correctly labelled data. Most common challenges are different languages, out of vocabulary words, long range dependencies and many more.

show abstract

“…Size Source Lang. Classes 1 Bretschneider and Peters (2016) 1.8k Forum en offense 2 Bretschneider and Peters (2016) 1.2k Forum en offense 3 16.9k Twitter en racism,sexism 4 0.7k Twitter id hate 5 0.5k Twitter de hate 6 Bretschneider and Peters (2017) 5.8k Facebook de strong/weak offense,target 7 25.0k Twitter en hate,offense 8 Gao and Huang (2017) 1.5k news en hate 9 Jha and Mamidi (2017) 10.0k Twitter en benevolent/hostile sexism 10 Mubarak et al (2017) 31.7k news ar obscene,offensive 11 1.1k Twitter ar obscene,offensive 12 115.9k Wikipedia en attack 13 115.9k Wikipedia en aggressive 14 160.0k Wikipedia en toxic 15 Albadi et al (2018) 6.1k Twitter ar hate 16 ElSherief et al (2018) 28.0k Twitter en hate,target 17 80.0k Twitter en six classes d 18 de 10.6k Forum en hate 19 2.0k Twitter id abuse,offense 20 11.6k Facebook hing aggressive 21 Mathur et al (2018) 3.2k Twitter en,hi abuse,hate 22 6.9k Twitter it five classes b 23 8.5k Twitter de abuse,insult,profanity 24 19.6k Twitter en,es aggression,hate,target 25 15.0k misc en,fr,it hate,counter-narrative 26 Fortuna et al (2019) 5.7k Twitter pt hate,target 27 Ibrohim and Budi (2019) 13.2k Twitter id abuse,strong/weak hate,target 28 Mandl et al (2019) 6.0k Twitter hi hate,offense,profanity,target 29 Mandl et al (2019) 4.7k Twitter de hate,offense,profanity,target 30 Mandl et al (2019) 7.0k Twitter en hate,offense,profanity,target 31 Mulki et al (2019) 5.8k Twitter ar abuse,hate 32 5.6k Twitter fr abuse,hate,offense,target 33 5.6k Twitter en abuse,hate,offense,target 34 4.0k Twitter en abuse,hate,offense,target 35 3.3k Twitter ar abuse,hate,offense,target 36 22.3k Forum en hate 37 33.8k Forum en hate 38 13.2k Twitter en offense 39 Çöltekin (2020) 36.0k Twitter tr offense,target 40 Pitenis et al (2020) 4.8k Twitter el offense 41 Sigurbergsson and Derczynski (2020) Community-level bans are a common tool against groups that enable online harassment and harmful speech. Unfortunately, the efficacy of community bans has only been partially studied and with mixed results.…”

Section: Id Studymentioning

confidence: 99%

“…The development of systems for the automatic identification of abusive language phenomena has followed a common trend in NLP: feature-based linear classifiers Ribeiro et al, 2018;Ibrohim and Budi, 2019), neural network architectures (e.g., CNN or Bi-LSTM) (Kshirsagar et al, 2018;Mishra et al, 2018;Mitrović et al, 2019;Sigurbergsson and Derczynski, 2020), and fine-tuning pre-trained language models, e.g., BERT, RoBERTa, a.o., Swamy et al, 2019). Results vary both across datasets and architectures, with linear classifiers qualifying as very competitive, if not better, when compared to neural networks.…”

Section: Introductionmentioning

confidence: 99%

Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

2021

View full text Add to dashboard Cite

Message from the OrganisersDigital technologies have brought myriad benefits for society, transforming how people connect, communicate and interact with each other. However, they have also enabled harmful and abusive behaviours to reach large audiences and for their negative effects to be amplified, including interpersonal aggression, bullying and hate speech. Already marginalised and vulnerable communities are often disproportionately at risk of receiving such abuse, compounding other social inequalities and injustices. The Workshop on Online Abuse and Harms (WOAH) convenes research into these issues, particularly work that develops, interrogates and applies computational methods for detecting, classifying and modelling online abuse.Technical disciplines such as machine learning and natural language processing (NLP) have made substantial advances in creating more powerful technologies to stop online abuse. Yet a growing body of work shows the limitations of many automated detection systems for tackling abusive online content, which can be biased, brittle, low performing and simplistic. These issues are magnified by the lack of explainability and transparency. And although WOAH is collocated with ACL and many of our papers are rooted firmly in the field of machine learning, these are not purely engineering challenges, but raise fundamental social questions of fairness and harm. For this reason, we continue to emphasise the need for inter-, cross-and anti-disciplinary work by inviting contributions from a range of fields, including but not limited to: NLP, machine learning, computational social sciences, law, politics, psychology, network analysis, sociology and cultural studies. In this fifth edition of WOAH we direct the conversation at the workshop through our theme: Social Bias and Unfairness in Online Abuse Detection Systems. Continuing the tradition started in WOAH 4, we have invited civil society, in particular individuals and organisations working with women and marginalised communities, to submit reports, case studies, findings, data, and to record their lived experiences through our civil society track. Our hope is that WOAH provides a platform to facilitate the interdisciplinary conversations and collaborations that are needed to effectively and ethically address online abuse.Speaking to the complex nature of the issue of online abuse, we are pleased to invite Leon Derczynski, currently an Associate Professor at ITU Copenhagen who works on a range of topics in Natural Language Processing; Deb Raji, currently a Research Fellow at Mozilla who researches AI accountability and auditing; Murali Shanmugavelan, currently a researcher at the Centre for Global Media and Communications at SOAS (London) to deliver keynotes. We are grateful to all our speakers for being available, and look forward to the dialogues that they will generate. On the day of WOAH the invited keynote speakers will give talks and then take part in a multi-disciplinary panel discussion to debate our theme and other issues in computational o...

show abstract

Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter

Cited by 135 publications

References 24 publications

Toxic Comment Collection: Making More Than 30 Datasets Easily Accessible in One Unified Format

Toxic Comment Collection: Making More Than 30 Datasets Easily Accessible in One Unified Format

A Systematic Literature Review of Different Machine Learning Methods on Hate Speech Detection

Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

Contact Info

Product

Resources

About