2022
DOI: 10.7717/peerj-cs.906
|View full text |Cite
|
Sign up to set email alerts
|

Detecting racism and xenophobia using deep learning models on Twitter data: CNN, LSTM and BERT

Abstract: With the growth that social networks have experienced in recent years, it is entirely impossible to moderate content manually. Thanks to the different existing techniques in natural language processing, it is possible to generate predictive models that automatically classify texts into different categories. However, a weakness has been detected concerning the language used to train such models. This work aimed to develop a predictive model based on BERT, capable of detecting racist and xenophobic messages in t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 20 publications
(9 citation statements)
references
References 41 publications
2
7
0
Order By: Relevance
“…Results obtained in (Plaza-del Arco et al, 2021) showed that BETO, a monolingual LM outperforms multilingual pre-trained models such as XLM and mBERT as well as the rest of the models they evaluated for hate speech detection in Spanish. Results in line with Plaza-del Arco et al (2021) have also been achieved in other similar studies on hate speech detection (Benítez-Andrades et al, 2022;Tanase et al, 2020). Nozza (2021) studied hate speech detection against women and immigrants across three languages: Spanish, English, and Italian.…”
Section: Related Worksupporting
confidence: 83%
“…Results obtained in (Plaza-del Arco et al, 2021) showed that BETO, a monolingual LM outperforms multilingual pre-trained models such as XLM and mBERT as well as the rest of the models they evaluated for hate speech detection in Spanish. Results in line with Plaza-del Arco et al (2021) have also been achieved in other similar studies on hate speech detection (Benítez-Andrades et al, 2022;Tanase et al, 2020). Nozza (2021) studied hate speech detection against women and immigrants across three languages: Spanish, English, and Italian.…”
Section: Related Worksupporting
confidence: 83%
“…While the pre-training process, BERT learns to predict missing words in a sentence and to distinguish well structured information from random ones. This model is ideal for cases where named entity recognition or sentiment analysis are done among other activities [ 29 ]. So, this model is on the best candidates to give the most accurate percentage of accuracy among the other models.…”
Section: Methodsmentioning
confidence: 99%
“…To balance the computational efficiency and model accuracy, a batch size of 64 was used. The Adam optimizer was chosen to manage the update of the model weights, as it has been shown to be effective in optimizing deep learning models [ 49 , 50 ]. Additionally, a learning rate of 0.0001 was set to control the step size when performing the update, as it affects the convergence speed of the model during training.…”
Section: Materials and Methodsmentioning
confidence: 99%