2021
DOI: 10.1007/978-981-16-0586-4_37
|View full text |Cite
|
Sign up to set email alerts
|

Hate Speech Detection in the Bengali Language: A Dataset and Its Baseline Evaluation

Abstract: Social media sites such as YouTube and Facebook have become an integral part of everyone's life and in the last few years, hate speech in the social media comment section has increased rapidly. Detection of hate speech on social media websites faces a variety of challenges including small imbalanced data sets, the finding of an appropriate model and also the choice of feature analysis method. Furthermore, this problem is more severe for the Bengali speaking community due to the lack of gold standard labelled d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 63 publications
(30 citation statements)
references
References 16 publications
0
19
0
Order By: Relevance
“…As reported, the proposed CNN + BiLSTM + CNN model frequently outperforms baseline (Das et al, 2021). In another dataset of Bengali hate speech detection (Romim et al, 2021), the fusion model with self-attention CNN + attn. + BiLSTM + CNN outperforms all the previous DNN and ML implementations, as evident from Table 4.…”
Section: Glue Benchmark With Artificial Data Scarcitymentioning
confidence: 51%
“…As reported, the proposed CNN + BiLSTM + CNN model frequently outperforms baseline (Das et al, 2021). In another dataset of Bengali hate speech detection (Romim et al, 2021), the fusion model with self-attention CNN + attn. + BiLSTM + CNN outperforms all the previous DNN and ML implementations, as evident from Table 4.…”
Section: Glue Benchmark With Artificial Data Scarcitymentioning
confidence: 51%
“…The study considers datasets across different languages and contexts for the efficacy demonstration of CNN + BiLSTM + CNN fusion. We developed a new Bengali corpus for 6-class emotion classification, as well as used other previously developed Bengali datasets for different NLP tasks-i) Sixclass emotion Bengali dataset (Das et al, 2021), ii) Hate Speech Bengali dataset (Romim et al, 2021), and iii) DeepHateExplainer Bengali dataset (Karim et al, 2020). As examples of non-Bengali languages that relate the low-resource contexts, we consider the Vietnamese (Ho et al, 2019) and Indonesian (Saputri et al, 2018) datasets.…”
Section: Datasetsmentioning
confidence: 99%
“…But, despite being the world's seventh most spoken language with 240 million native speakers [4], research on sarcasm detection in the Bengali language is unexplored and overlooked. Due to the limited resources and the scarcity of large-scale sarcasm data, identifying sarcasm from Bengali text is currently a difficult challenge for the researchers of NLP [5].…”
Section: Introductionmentioning
confidence: 99%