Detecting racism and xenophobia using deep learning models on Twitter data: CNN, LSTM and BERT

Bení­tez-Andrades, José Alberto; González-Jiménez, Álvaro; López-Brea, Álvaro; Aveleira‐Mata, José; Alija-Pérez, José-Manuel; Garcí­a-Ordás, María Teresa

doi:10.7717/peerj-cs.906

Cited by 20 publications

(9 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Results obtained in (Plaza-del Arco et al, 2021) showed that BETO, a monolingual LM outperforms multilingual pre-trained models such as XLM and mBERT as well as the rest of the models they evaluated for hate speech detection in Spanish. Results in line with Plaza-del Arco et al (2021) have also been achieved in other similar studies on hate speech detection (Benítez-Andrades et al, 2022;Tanase et al, 2020). Nozza (2021) studied hate speech detection against women and immigrants across three languages: Spanish, English, and Italian.…”

Section: Related Worksupporting

confidence: 83%

Analyzing Zero-Shot transfer Scenarios across Spanish variants for Hate Speech Detection

Castillo-lópez,

Riabi,

Seddah

2023

Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)

View full text Add to dashboard Cite

Hate speech detection in online platforms has been widely studied in the past. Most of these works were conducted in English and a few rich-resource languages. Recent approaches tailored for low-resource languages have explored the interests of zero-shot cross-lingual transfer learning models in resource-scarce scenarios. However, languages variations between geolects such as American English and British English, Latin-American Spanish, and European Spanish is still a problem for NLP models that often relies on (latent) lexical information for their classification tasks. More importantly, the cultural aspect, crucial for hate speech detection, is often overlooked.In this work, we present the results of a thorough analysis of hate speech detection models performance on different variants of Spanish, including a new hate speech toward immigrants Twitter data set we built to cover these variants. Using mBERT and Beto, a monolingual Spanish Bert-based language model, as the basis of our transfer learning architecture, our results indicate that hate speech detection models for a given Spanish variant are affected when different variations of such language are not considered. Hate speech expressions could vary from region to region where the same language is spoken. * Work conducting during an internship at Inria Paris. 1 Please be aware that this paper contains some examples of offensive slurs that may be considered upsetting.

show abstract

Section: Related Worksupporting

confidence: 83%

Analyzing Zero-Shot transfer Scenarios across Spanish variants for Hate Speech Detection

Castillo-lópez,

Riabi,

Seddah

2023

Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)

View full text Add to dashboard Cite

show abstract

“…While the pre-training process, BERT learns to predict missing words in a sentence and to distinguish well structured information from random ones. This model is ideal for cases where named entity recognition or sentiment analysis are done among other activities [ 29 ]. So, this model is on the best candidates to give the most accurate percentage of accuracy among the other models.…”

Section: Methodsmentioning

confidence: 99%

Enhancing ASD detection accuracy: a combined approach of machine learning and deep learning models with natural language processing

Rubio-Martín,

García-Ordás,

Bayón-Gutiérrez

et al. 2024

Health Inf Sci Syst

Self Cite

View full text Add to dashboard Cite

Purpose The main aim of our study was to explore the utility of artificial intelligence (AI) in diagnosing autism spectrum disorder (ASD). The study primarily focused on using machine learning (ML) and deep learning (DL) models to detect ASD potential cases by analyzing text inputs, especially from social media platforms like Twitter. This is to overcome the ongoing challenges in ASD diagnosis, such as the requirement for specialized professionals and extensive resources. Timely identification, particularly in children, is essential to provide immediate intervention and support, thereby improving the quality of life for affected individuals. Methods We employed natural language processing (NLP) techniques along with ML models like decision trees, extreme gradient boosting (XGB), k-nearest neighbors algorithm (KNN), and DL models such as recurrent neural networks (RNN), long short-term memory (LSTM), bidirectional long short-term memory (Bi-LSTM), bidirectional encoder representations from transformers (BERT and BERTweet). We extracted a dataset of 404,627 tweets from Twitter users using the platform’s API and classified them based on whether they were written by individuals claiming to have ASD (ASD users) or by those without ASD (non-ASD users). From this dataset, we used a subset of 90,000 tweets (45,000 from each classification group) for the training and testing of these models. Results The application of our AI models yielded promising results, with the predictive model reaching an accuracy of almost 88% when classifying texts that potentially originated from individuals with ASD. Conclusion Our research demonstrated the potential of using AI, particularly DL models, in enhancing the accuracy of ASD detection and diagnosis. This innovative approach signifies the critical role AI can play in advancing early diagnostic techniques, enabling better patient outcomes and underlining the importance of early identification of ASD, especially in children.

show abstract

“…To balance the computational efficiency and model accuracy, a batch size of 64 was used. The Adam optimizer was chosen to manage the update of the model weights, as it has been shown to be effective in optimizing deep learning models [ 49 , 50 ]. Additionally, a learning rate of 0.0001 was set to control the step size when performing the update, as it affects the convergence speed of the model during training.…”

Section: Materials and Methodsmentioning

confidence: 99%

Sentiment Analysis of Social Media Data on Ebola Outbreak Using Deep Learning Classifiers

Mirugwe,

Ashaba,

Namale

et al. 2024

Life

View full text Add to dashboard Cite

The Ebola virus disease (EVD) is an extremely contagious and fatal illness caused by the Ebola virus. Recently, Uganda witnessed an outbreak of EVD, which generated much attention on various social media platforms. To ensure effective communication and implementation of targeted health interventions, it is crucial for stakeholders to comprehend the sentiments expressed in the posts and discussions on these online platforms. In this study, we used deep learning techniques to analyse the sentiments expressed in Ebola-related tweets during the outbreak. We explored the application of three deep learning techniques to classify the sentiments in 8395 tweets as positive, neutral, or negative. The techniques examined included a 6-layer convolutional neural network (CNN), a 6-layer long short-term memory model (LSTM), and an 8-layer Bidirectional Encoder Representations from Transformers (BERT) model. The study found that the BERT model outperformed both the CNN and LSTM-based models across all the evaluation metrics, achieving a remarkable classification accuracy of 95%. These findings confirm the reported effectiveness of Transformer-based architectures in tasks related to natural language processing, such as sentiment analysis.

show abstract

Detecting racism and xenophobia using deep learning models on Twitter data: CNN, LSTM and BERT

Cited by 20 publications

References 41 publications

Analyzing Zero-Shot transfer Scenarios across Spanish variants for Hate Speech Detection

Analyzing Zero-Shot transfer Scenarios across Spanish variants for Hate Speech Detection

Enhancing ASD detection accuracy: a combined approach of machine learning and deep learning models with natural language processing

Sentiment Analysis of Social Media Data on Ebola Outbreak Using Deep Learning Classifiers

Contact Info

Product

Resources

About