2022
DOI: 10.3390/info13070318
|View full text |Cite
|
Sign up to set email alerts
|

Detection of Racist Language in French Tweets

Abstract: Toxic online content has become a major issue in recent years due to the exponential increase in the use of the internet. In France, there has been a significant increase in hate speech against migrant and Muslim communities following events such as Great Britain’s exit from the EU, the Charlie Hebdo attacks, and the Bataclan attacks. Therefore, the automated detection of offensive language and racism is in high demand, and it is a serious challenge. Unfortunately, there are fewer datasets annotated for racist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 49 publications
(51 reference statements)
0
4
0
Order By: Relevance
“…The study conducted by [18] utilized Bidirectional Encoder Representations from Transformers (BERT) text representations combined with Logistic Regression to classify racist language in French, achieving an accuracy of 0.79. The authors of [10] evaluated seventeen ML models based on n-grams at both the word and character levels for detecting offensive and abusive language in Urdu and Roman Urdu text.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The study conducted by [18] utilized Bidirectional Encoder Representations from Transformers (BERT) text representations combined with Logistic Regression to classify racist language in French, achieving an accuracy of 0.79. The authors of [10] evaluated seventeen ML models based on n-grams at both the word and character levels for detecting offensive and abusive language in Urdu and Roman Urdu text.…”
Section: Related Workmentioning
confidence: 99%
“…Similarly, the research conducted by [30] employed Support Vector Machines (SVM) and Long Short-Term Memory (LSTM) models to detect instances of hate speech in Italian. The study of [18] aimed to identify instances of racist discourse in French.…”
Section: Related Workmentioning
confidence: 99%
“…Whereas English is backed by most of the mature solutions, other top languages such as French and Spanish have also presented interesting approaches. Vanetik and Mimoun [11] have built and annotated a dataset of 2856 French tweets where 927 were labeled as "racist" whereas the rest was "not racist". They applied TF-IDF, N-gram, and BERT sentence embedding for text representation, then went on to compare different binary classification models such as "Random Forest" (RF), "Logistic Regression" (LR), and "Extreme Gradient Boosting" (XGBoost).…”
Section: Related Workmentioning
confidence: 99%
“…For instance, there exist several versions of Jigsaw datasetsmonolingual (Jigsaw, 2018) for English and multilingual (Jigsaw, 2020) covering 6 languages. In addition, there are corpora specifically for Russian (Semiletov, 2020), Korean (Moon et al, 2020), French (Vanetik and Mimoun, 2022) languages, inter alia. These are non-parallel classification datasets.…”
Section: Related Workmentioning
confidence: 99%