Multilingual Offensive Language Identification for Low-resource Languages

Ranasinghe, Tharindu; Zampieri, Marcos

doi:10.1145/3457610

Cited by 40 publications

(18 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ranasinghe et al [16,17] showed the e ectiveness of cross-lingual transfer in o ensive language identification in Hindi, Spanish, Danish, Greek and Bengali. Their work showed that multilingual transformer models like mBert and XLM-R can use the knowledge gained from higher resource languages to gain an improved performance on a lowresource target.…”

Section: Abusive Language Detectionmentioning

confidence: 99%

“…To get around this problem, it has been shown that with cross-lingual transfer, the performance on lowresource languages can be improved by leveraging knowledge from other higher resource languages. This has also been demonstrated to be an e ective technique in improving o ensive content detection in low resource languages by using cross-lingual word embeddings and multilingual transformer models [16,17,18,19].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Transfer language selection for zero-shot cross-lingual abusive language detection

Eronen

Ptaszyński

Masui

et al. 2022

Information Processing & Management

View full text Add to dashboard Cite

Section: Abusive Language Detectionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Transfer language selection for zero-shot cross-lingual abusive language detection

Eronen

Ptaszyński

Masui

et al. 2022

Information Processing & Management

View full text Add to dashboard Cite

“…1 sarcastic adaptation of the French madame; the word suggests the person does not deserve the title of lady, madam 2 a loud lower class woman who is unrefined 3 slang specifically used to address women directly, similar to lady, but it implies the woman is lower class 4 it suggests that the woman addressed is older, unattractive and unrefined 5 of one of the politicians 6 the polite second person singular or plural 7 the polite second person singular or plural, or plain form of second person plural Multilingual BERT As opposed to the BOW and TF-IDF word representations, which do not contain any information about the context, sentence representations as given by modern transformer networks (Reimers and Gurevych, 2019) offer richer semantic information and have been successfully used in low-resource scenarios (Ranasinghe and Zampieri, 2021). As such, we use Sentence Transformer (Reimers and Gurevych, 2019) to extract embeddings from BERT-based models.…”

Section: Text Representationmentioning

confidence: 99%

Users Hate Blondes: Detecting Sexism in User Comments on Online Romanian News

Moldovan¹,

Csürös²,

Bucur³

et al. 2022

Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

View full text Add to dashboard Cite

Romania ranks almost last in Europe when it comes to gender equality in political representation, with about 10% fewer women in politics than the E.U. average. We proceed from the assumption that this underrepresentation is also influenced by the sexism and verbal abuse female politicians face in the public sphere, especially in online media. We propose a novel dataset with sexist comments in Romanian language from online newspaper articles about Romanian female politicians and experiment with baseline models using classical machine learning models and fine-tuned pre-trained transformer models for the classification of sexist language in the online medium.

show abstract

“…Substantial experiments by Fortuna et al [34] showed that training with one data set and testing with another one can decrease the performance by over 30%. Many potential reasons can be seen as obstacles for the generalisability [35,36,37,38] such as dataset size and annotation quality. However, little is known about their effects.…”

Section: Reliability Of Data Setsmentioning

confidence: 99%

Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German

Mandl

Modha²,

et al. 2020

Forum for Information Retrieval Evaluation

103

View full text Add to dashboard Cite

With the growth of social media, the spread of hate speech is also increasing rapidly. Social media are widely used in many countries. Also Hate Speech is spreading in these countries. This brings a need for multilingual Hate Speech detection algorithms. Much research in this area is dedicated to English at the moment. The HASOC track intends to provide a platform to develop and optimize Hate Speech detection algorithms for Hindi, German and English. The dataset is collected from a Twitter archive and pre-classified by a machine learning system. HASOC has two sub-task for all three languages: task A is a binary classification problem (Hate and Not Offensive) while task B is a fine-grained classification problem for three classes (HATE) Hate speech, OFFENSIVE and PROFANITY. Overall, 252 runs were submitted by 40 teams. The performance of the best classification algorithms for task A are F1 measures of 0.51, 0.53 and 0.52 for English, Hindi, and German, respectively. For task B, the best classification algorithms achieved F1 measures of 0.26, 0.33 and 0.29 for English, Hindi, and German, respectively. This article presents the tasks and the data development as well as the results. The best performing algorithms were mainly variants of the transformer architecture BERT. However, also other systems were applied with good success.

show abstract

Multilingual Offensive Language Identification for Low-resource Languages

Cited by 40 publications

References 36 publications

Transfer language selection for zero-shot cross-lingual abusive language detection

Transfer language selection for zero-shot cross-lingual abusive language detection

Users Hate Blondes: Detecting Sexism in User Comments on Online Romanian News

Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German

Contact Info

Product

Resources

About