Hate speech is considered to be one of the major issues currently plaguing online social media. Repeated and repetitive exposure to hate speech has been shown to create physiological effects on the target users. Thus, hate speech, in all its forms, should be addressed on these platforms in order to maintain good health. In this paper, we explored several Transformer based machine learning models for the detection of hate speech and offensive content in English and Indo-Aryan languages at FIRE 2021. We explore several models such as mBERT, XLMR-large, XLMR-base by team name "Super Mario". Our models came 2ππ position in Code-Mixed Data set (Macro F1: 0.7107), 2ππ position in Hindi two-class classification (Macro F1: 0.7797), 4π‘β in English four-class category (Macro F1: 0.8006) and 12π‘β in English two-class category (Macro F1: 0.6447). We have made our code public 1 .
Online hatred is a growing concern on many social media platforms. To address this issue, different social media platforms have introduced moderation policies for such content. They also employ moderators who can check the posts violating moderation policies and take appropriate action. Academicians in the abusive language research domain also perform various studies to detect such content better. Although there is extensive research in abusive language detection in English, there is a lacuna in abusive language detection in low resource languages like Hindi, Urdu etc. In this FIRE 2021 shared task -"HASOC -Abusive and Threatening language detection in Urdu" the organisers propose an abusive language detection dataset in Urdu along with threatening language detection.In this paper, we explored several machine learning models such as XGboost, LGBM, m-BERT based models for abusive and threatening content detection in Urdu based on the shared task. We observed the Transformer model specifically trained on abusive language dataset in Arabic helps in getting the best performance. Our model came First for both abusive and threatening content detection with an F1score of 0.88 and 0.54, respectively. We have made our code public 1 .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citationsβcitations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright Β© 2024 scite LLC. All rights reserved.
Made with π for researchers
Part of the Research Solutions Family.