Proceedings of the Fourth Workshop on Online Abuse and Harms 2020
DOI: 10.18653/v1/2020.alw-1.5
|View full text |Cite
|
Sign up to set email alerts
|

HurtBERT: Incorporating Lexical Features with BERT for the Detection of Abusive Language

Abstract: The detection of abusive or offensive remarks in social texts has received significant attention in research. In several related shared tasks, BERT has been shown to be the state-of-theart. In this paper, we propose to utilize lexical features derived from a hate lexicon towards improving the performance of BERT in such tasks. We explore different ways to utilize the lexical features in the form of lexicon-based encodings at the sentence level or embeddings at the word level. We provide an extensive dataset ev… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
37
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 56 publications
(42 citation statements)
references
References 26 publications
0
37
0
Order By: Relevance
“…Building upon BERT, a handful of recent studies suggest that additional hate-specific knowledge from outside the fine-tuning dataset might help with generalisation. Such knowledge can come from further masked language modelling pre-training on an abusive corpus ( Caselli et al, 2021 ), or features from a hate speech lexicon ( Koufakou et al, 2020 ).…”
Section: Generalisation Studies In Hate Speech Detectionmentioning
confidence: 99%
“…Building upon BERT, a handful of recent studies suggest that additional hate-specific knowledge from outside the fine-tuning dataset might help with generalisation. Such knowledge can come from further masked language modelling pre-training on an abusive corpus ( Caselli et al, 2021 ), or features from a hate speech lexicon ( Koufakou et al, 2020 ).…”
Section: Generalisation Studies In Hate Speech Detectionmentioning
confidence: 99%
“…Some other studies adopted several neural-based models, including convolutional neural networks (CNN) [75,141], long short-term memory (LSTM) [8,75,92,94,145], bidirectional LSTM (Bi-LSTM) [115], and gated recurrent unit (GRU) [27]. The most recent works focus more on investigating transferability or generalizability of stateof-the-art transformer-based models such as Bidirectional Encoder Representations from Transformers (BERT) [19,48,66,79,83,90,92,134] and its variant like RoBERTa [48] in the cross-domain abusive language detection task.…”
Section: Modelsmentioning
confidence: 99%
“…Transformer based Infused specific hateful lexicon called HurtLex into BERT model to transfer knowledge across domains. [66] Multiple models Besides experimented with a wide coverage of models including traditional (linear SVM), (LSTM), and (BERT), they also exploited HurtLex as domain-independent features for knowledge transfer between domains. [92] Neural based Experimented with augmenting all training data from different domains, resulting in the performance improvement of the models based on BERT and RoBERTa representation.…”
Section: Modelsmentioning
confidence: 99%
See 2 more Smart Citations