Detection of Racist Language in French Tweets

Vanetik, Natalia; Mimoun, Elisheva

doi:10.3390/info13070318

Cited by 8 publications

(9 citation statements)

References 49 publications

(51 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The study conducted by [18] utilized Bidirectional Encoder Representations from Transformers (BERT) text representations combined with Logistic Regression to classify racist language in French, achieving an accuracy of 0.79. The authors of [10] evaluated seventeen ML models based on n-grams at both the word and character levels for detecting offensive and abusive language in Urdu and Roman Urdu text.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Abusive Language Detection in Urdu Text: Leveraging Deep Learning and Attention Mechanism

Khan,

Ahmed,

Jan

et al. 2024

IEEE Access

View full text Add to dashboard Cite

The widespread use of the Internet and the tremendous growth of social media have enabled people to connect with each other worldwide. Individuals are free to express themselves online, sharing their photos, videos, and text messages globally. However, such freedom sometimes leads to misuse, as some individuals exploit this platform by posting hateful and abusive comments on forums. The proliferation of abusive language on social media negatively impacts individuals and groups, leading to emotional distress and affecting mental health. It is crucial to automatically detect and filter such abusive content in order to effectively tackle this challenging issue. Detecting abusive language in text messages is challenging due to intentional word concealment and contextual complexity. To counter abusive speech on social media, we need to explore the potential of machine learning (ML) and deep learning (DL) models, particularly those equipped with attention mechanisms. In this study, we utilized popular ML and DL models integrated with attention mechanism to detect abusive language in Urdu text. Our methodology involved employing Count Vectorizer and Term Frequency-Inverse Document Frequency (TF/IDF) to extract n-grams at the word level: Unigrams (Uni), Bigrams (Bi), Trigrams (Tri), and their combination (Uni + Bi + Tri). Initially, we evaluated four traditional ML models-Logistic Regression (LR), Gaussian Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest (RF)-on both proposed and established datasets. The results highlighted that RF model outperformed other conventional models in terms of accuracy, precision, recall, and F1-measure on both datasets. In our implementation of deep learning models, we employed various models integrated with custom fastText and Word2Vec embeddings, each equipped with an attention layer, except for the Convolutional Neural Network (CNN). Our findings indicated that the Bidirectional Long Short-Term Memory (Bi-LSTM) + attention model, utilizing custom Word2Vec embeddings, exhibited improved performance in detecting abusive language on both datasets.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Similarly, the research conducted by [30] employed Support Vector Machines (SVM) and Long Short-Term Memory (LSTM) models to detect instances of hate speech in Italian. The study of [18] aimed to identify instances of racist discourse in French.…”

Section: Related Workmentioning

confidence: 99%

Abusive Language Detection in Urdu Text: Leveraging Deep Learning and Attention Mechanism

Khan,

Ahmed,

Jan

et al. 2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Whereas English is backed by most of the mature solutions, other top languages such as French and Spanish have also presented interesting approaches. Vanetik and Mimoun [11] have built and annotated a dataset of 2856 French tweets where 927 were labeled as "racist" whereas the rest was "not racist". They applied TF-IDF, N-gram, and BERT sentence embedding for text representation, then went on to compare different binary classification models such as "Random Forest" (RF), "Logistic Regression" (LR), and "Extreme Gradient Boosting" (XGBoost).…”

Section: Related Workmentioning

confidence: 99%

Detecting Hateful and Offensive Speech in Arabic Social Media Using Transfer Learning

Boulouard

Ouaissa

et al. 2022

Applied Sciences

View full text Add to dashboard Cite

The democratization of access to internet and social media has given an opportunity for every individual to openly express his or her ideas and feelings. Unfortunately, this has also created room for extremist, racist, misogynist, and offensive opinions expressed either as articles, posts, or comments. While controlling offensive speech in English-, Spanish-, and French- speaking social media communities and websites has reached a mature level, it is much less the case for their counterparts in Arabic-speaking countries. This paper presents a transfer learning solution to detect hateful and offensive speech on Arabic websites and social media platforms. This paper will compare the performance of different BERT-based models trained to classify comments as either abusive or neutral. The training dataset contains comments in standard Arabic as well as four dialects. We will also use their English translations for comparative purposes. The models were evaluated based on five metrics: Accuracy, Precision, Recall, F1-Score, and Confusion Matrix.

show abstract

“…For instance, there exist several versions of Jigsaw datasetsmonolingual (Jigsaw, 2018) for English and multilingual (Jigsaw, 2020) covering 6 languages. In addition, there are corpora specifically for Russian (Semiletov, 2020), Korean (Moon et al, 2020), French (Vanetik and Mimoun, 2022) languages, inter alia. These are non-parallel classification datasets.…”

Section: Related Workmentioning

confidence: 99%

Methods for Detoxification of Texts for the Russian Language

Dementieva¹,

Moskovskiy²,

Logacheva³

et al. 2021

Computational Linguistics and Intellectual Technologies

View full text Add to dashboard Cite

We introduce the first study of automatic detoxification of Russian texts to combat offensive language. Such a kind of textual style transfer can be used, for instance, for processing toxic content in social media. While much work has been done for the English language in this field, it has never been solved for the Russian language yet. We test two types of models -unsupervised approach based on BERT architecture that performs local corrections and supervised approach based on pretrained language GPT-2 model -and compare them with several baselines. In addition, we describe evaluation setup providing training datasets and metrics for automatic evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.

show abstract

Detection of Racist Language in French Tweets

Cited by 8 publications

References 49 publications

Abusive Language Detection in Urdu Text: Leveraging Deep Learning and Attention Mechanism

Abusive Language Detection in Urdu Text: Leveraging Deep Learning and Attention Mechanism

Detecting Hateful and Offensive Speech in Arabic Social Media Using Transfer Learning

Methods for Detoxification of Texts for the Russian Language

Contact Info

Product

Resources

About