Somnath Banerjee scite author profile

Somnath Banerjee

2Publications

11Citation Statements Received

38Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages

Banerjee¹,

Sarkar²,

Agrawal³

et al. 2021

Preprint

View full text Add to dashboard Cite

Hate speech is considered to be one of the major issues currently plaguing online social media. Repeated and repetitive exposure to hate speech has been shown to create physiological effects on the target users. Thus, hate speech, in all its forms, should be addressed on these platforms in order to maintain good health. In this paper, we explored several Transformer based machine learning models for the detection of hate speech and offensive content in English and Indo-Aryan languages at FIRE 2021. We explore several models such as mBERT, XLMR-large, XLMR-base by team name "Super Mario". Our models came 2𝑛𝑑 position in Code-Mixed Data set (Macro F1: 0.7107), 2𝑛𝑑 position in Hindi two-class classification (Macro F1: 0.7797), 4𝑡ℎ in English four-class category (Macro F1: 0.8006) and 12𝑡ℎ in English two-class category (Macro F1: 0.6447). We have made our code public 1 .

show abstract

Abusive and Threatening Language Detection in Urdu using Boosting based and BERT based models: A Comparative Approach

Das¹,

Banerjee²,

Saha³

2021

Preprint

View full text Add to dashboard Cite

Online hatred is a growing concern on many social media platforms. To address this issue, different social media platforms have introduced moderation policies for such content. They also employ moderators who can check the posts violating moderation policies and take appropriate action. Academicians in the abusive language research domain also perform various studies to detect such content better. Although there is extensive research in abusive language detection in English, there is a lacuna in abusive language detection in low resource languages like Hindi, Urdu etc. In this FIRE 2021 shared task -"HASOC -Abusive and Threatening language detection in Urdu" the organisers propose an abusive language detection dataset in Urdu along with threatening language detection.In this paper, we explored several machine learning models such as XGboost, LGBM, m-BERT based models for abusive and threatening content detection in Urdu based on the shared task. We observed the Transformer model specifically trained on abusive language dataset in Arabic helps in getting the best performance. Our model came First for both abusive and threatening content detection with an F1score of 0.88 and 0.54, respectively. We have made our code public 1 .

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.