Nauros Romim scite author profile

Nauros Romim

4Publications

24Citation Statements Received

37Citation Statements Given

How they've been cited

How they cite others

Affiliations

Shahjalal University of Science and Technology

Publications

Order By: Most citations

Hate Speech Detection in the Bengali Language: A Dataset and Its Baseline Evaluation

Romim

Ahmed

Talukder

et al. 2021

View full text Add to dashboard Cite

Social media sites such as YouTube and Facebook have become an integral part of everyone's life and in the last few years, hate speech in the social media comment section has increased rapidly. Detection of hate speech on social media websites faces a variety of challenges including small imbalanced data sets, the finding of an appropriate model and also the choice of feature analysis method. Furthermore, this problem is more severe for the Bengali speaking community due to the lack of gold standard labelled datasets. This paper presents a new dataset of 30,000 user comments tagged by crowdsourcing and verified by expert. All the user comments collected from YouTube and Facebook comment section and to classified into seven categories: sports, entertainment, religion, politics, crime, celebrity, and TikTok & meme. A total of 50 annotators annotated each comment three times, and the majority vote was taken as the final annotation. Nevertheless, we have conducted baseline experiments and several deep learning models along with extensive pretrained Bengali word embedding such as Word2Vec, FastTest, and BengFastText on this dataset to facilitate future research opportunities. The experiment illustrated that although all the deep learning model performed well, SVM achieved the best result with 87.5% accuracy. Our core contribution is to make this benchmark dataset available and accessible to facilitate further research in the field of Bengali hate speech detection.

show abstract

Hate Speech detection in the Bengali language: A dataset and its baseline evaluation

Romim¹,

Ahmed²,

Talukder³

et al. 2020

Preprint

View full text Add to dashboard Cite

HS-BAN: A Benchmark Dataset of Social Media Comments for Hate Speech Detection in Bangla

Romim¹,

Ahmed²,

Islam³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we present HS-BAN, a binary class hate speech (HS) dataset in Bangla language consisting of more than 50,000 labeled comments, including 40.17% hate and rest are non hate speech. While preparing the dataset a strict and detailed annotation guideline was followed to reduce human annotation bias. The HS dataset was also preprocessed linguistically to extract different types of slang currently people write using symbols, acronyms, or alternative spellings. These slang words were further categorized into traditional and non-traditional slang lists and included in the results of this paper. We explored traditional linguistic features and neural network-based methods to develop a benchmark system for hate speech detection for the Bangla language. Our experimental results show that existing word embedding models trained with informal texts perform better than those trained with formal text. Our benchmark shows that a Bi-LSTM model on top of the FastText informal word embedding achieved 86.78% F1-score. We will make the dataset available for public use.

show abstract

BD-SHS: A Benchmark Dataset for Learning to Detect Online Bangla Hate Speech in Different Social Contexts

Romim¹,

Ahmed²,

Islam³

et al. 2022

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nauros Romim

Hate Speech Detection in the Bengali Language: A Dataset and Its Baseline Evaluation

Hate Speech detection in the Bengali language: A dataset and its baseline evaluation

HS-BAN: A Benchmark Dataset of Social Media Comments for Hate Speech Detection in Bangla

BD-SHS: A Benchmark Dataset for Learning to Detect Online Bangla Hate Speech in Different Social Contexts

Contact Info

Product

Resources

About