UVA Wahoos at SemEval-2019 Task 6: Hate Speech Identification using Ensemble Machine Learning

Ramakrishnan, Murugesan; Zadrozny, Wlodek; Tabari, Narges

doi:10.18653/v1/s19-2141

Cited by 8 publications

(7 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several NLP approaches have been proposed for the task of hate speech detection (Qian et al, 2018;Indurthi et al, 2019;Vidgen et al, 2021;Fersini et al, 2020a;Attanasio and Pastor, 2020;Kennedy et al, 2020;Attanasio et al, 2022b, inter alia). While ensemble modeling has been proven to be effective for several tasks in NLP (Garmash and Monz, 2016;Nozza et al, 2016;Fadel et al, 2019;Bashmal and AlZeer, 2021), a limited number of research work have investigated its potentiality for hate speech detection (Plaza-del Arco et al, 2019;Ramakrishnan et al, 2019;Zimmer-man et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

Nozza@LT-EDI-ACL2022: Ensemble Modeling for Homophobia and Transphobia Detection

Nozza¹

2022

Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

View full text Add to dashboard Cite

In this paper, we describe our approach for the task of homophobia and transphobia detection in English social media comments. The dataset consists of YouTube comments, and it has been released for the shared task on Homophobia/Transphobia Detection in social media comments. Given the high class imbalance, we propose a solution based on data augmentation and ensemble modeling. We fine-tuned different large language models (BERT, RoBERTa, and HateBERT) and used the weighted majority vote on their predictions. Our proposed model obtained 0.48 and 0.94 for macro and weighted F1-score, respectively, ranking at the third position.

show abstract

Section: Related Workmentioning

confidence: 99%

Nozza@LT-EDI-ACL2022: Ensemble Modeling for Homophobia and Transphobia Detection

Nozza¹

2022

Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

View full text Add to dashboard Cite

show abstract

“…Recent papers use word embedding methods more frequently than bagof-words and n-grans because the former can extract semantic information from the text; consequently, an improvement in the performance is expected. Regarding the classifier, different paradigms have been employed; tree-based algorithms such as decision trees and random forest (RF) [17,8,18,19], artificial neural networks such as multi-layer perceptron (MLP) and convolution neural networks (CNN) [20,21,22,16,23,24,25,26,27,28,29],…”

Section: Related Workmentioning

confidence: 99%

“…Bayesian as the naive bayes (NB) [17,8], support vector machines (SVM) [17,8], and ensemble learning, which is marked () in the last column of the [20] Twitter from [30,31] racism, sexism characters, words, and both CNN 2018 Zimmerman et al [21] Twitter from [30] racism, sexism embedding deep learning 2018 Pitsilis et al [22] Twitter from [30] racism, sexism defined by the authors LSTM 2018 Montani and Schuller [18] GermEval 2018 1 general TFIDF, Word2Vec, n-gram LR, RF, ET 2019 Zhang and Luo [16] Twitter from [17,30] [17]: race ethnicity, religion [30]: racism, sexism Word2Vec CNN 2019 Liu et al [32] Twitter from [17] race ethnicity, religion embedding, LDA fuzzy ensemble 2019 Ramakrishnan et al [19] OffensEval [33] general n-gram, GloVe, others LR, RF, XG 2020 Paschalides et al [23] Twitter from [8] racism, sexism, homophobia The most common social media used to extract information to compose a dataset for hate speech detection is Twitter. Despite English being the most used language, there are datasets from many other languages, such as the Arabic-Twitter dataset [26] and Hindi-English Twitter dataset [27].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Selecting and combining complementary feature representations and classifiers for hate speech detection

Cruz¹,

V.²,

Cavalcanti³

2022

Preprint

View full text Add to dashboard Cite

Hate speech is a major issue in social networks due to the high volume of data generated daily.Recent works demonstrate the usefulness of machine learning (ML) in dealing with the nuances required to distinguish between hateful posts from just sarcasm or offensive language. Many ML solutions for hate speech detection have been proposed by either changing how features are extracted from the text or the classification algorithm employed. However, most works consider only one type of feature extraction and classification algorithm. This work argues that a combination of multiple feature extraction techniques and different classification models is needed. We propose a framework to analyze the relationship between multiple feature extraction and classification techniques to understand how they complement each other.The framework is used to select a subset of complementary techniques to compose a robust multiple classifiers system (MCS) for hate speech detection. The experimental study considering four hate speech classification datasets demonstrates that the proposed framework is a promising methodology for analyzing and designing high-performing MCS for this task. MCS system obtained using the proposed framework significantly outperforms the combination of all models and the homogeneous and heterogeneous selection heuristics, demonstrating the importance of having a proper selection scheme. Source code, figures and dataset splits can be found in the GitHub repository: https://github.com/Menelau/Hate-Speech-MCS.

show abstract

“…The results proved that transfer learning improves offensive language detection performance. Ramakrishnan et al (2019) used an ensemble model based on logistic regression and tree-based model to identify offensive language on SemEval-2019 Task 6. Char n-grams, word n-grams, part of speech and GloVe embedding were used as features.…”

Section: Related Workmentioning

confidence: 99%

IR3218-UI at SemEval-2020 Task 12: Emoji Effects on Offensive Language IdentifiCation

Kurniawan¹,

Budi²,

Ibrohim³

2020

Proceedings of the Fourteenth Workshop on Semantic Evaluation

View full text Add to dashboard Cite

In this paper, we present our approach and the results of our participation in OffensEval 2020. There are three sub-tasks in OffensEval 2020, namely offensive language identification (sub-task A), automatic categorization of offense types (sub-task B), and offense target identification (subtask C). We participated in sub-task A of English OffensEval 2020. Our approach emphasizes on how the emoji affects offensive language identification. Our model used LSTM combined with GloVe pre-trained word vectors to identify offensive language on social media. The best model obtained macro F1-score of 0.88428.

show abstract

UVA Wahoos at SemEval-2019 Task 6: Hate Speech Identification using Ensemble Machine Learning

Cited by 8 publications

References 11 publications

Nozza@LT-EDI-ACL2022: Ensemble Modeling for Homophobia and Transphobia Detection

Nozza@LT-EDI-ACL2022: Ensemble Modeling for Homophobia and Transphobia Detection

Selecting and combining complementary feature representations and classifiers for hate speech detection

IR3218-UI at SemEval-2020 Task 12: Emoji Effects on Offensive Language IdentifiCation

Contact Info

Product

Resources

About