Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

Gaikwad, Saurabh; Ranasinghe, Tharindu; Zampieri, Marcos; Homan, Christopher M.

doi:10.26615/978-954-452-072-4_050

Cited by 27 publications

(23 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They also discovered that using unlabeled samples from the target language can be used to increase performance. Finally, Gaikwad et al [19] noticed that transfer learning from Hindi outperformed other languages when classifying entries in Marathi, suggesting a relation between cross-lingual transfer performance and language similarity.…”

Section: Abusive Language Detectionmentioning

confidence: 99%

“…To get around this problem, it has been shown that with cross-lingual transfer, the performance on lowresource languages can be improved by leveraging knowledge from other higher resource languages. This has also been demonstrated to be an e ective technique in improving o ensive content detection in low resource languages by using cross-lingual word embeddings and multilingual transformer models [16,17,18,19].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Transfer language selection for zero-shot cross-lingual abusive language detection

Eronen

Ptaszyński

Masui

et al. 2022

Information Processing & Management

View full text Add to dashboard Cite

Section: Abusive Language Detectionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Transfer language selection for zero-shot cross-lingual abusive language detection

Eronen

Ptaszyński

Masui

et al. 2022

Information Processing & Management

View full text Add to dashboard Cite

“…The SATLab participated in subtask 1 of the HASOC 2021 shared task "Hate Speech and Offensive Content Identification in English and For each language, learning and test materials have been provided by the task organizers Gaikwad et al, 2021). The frequencies (#) and percentages (%) in each category of each problem for each language are given in Table 1.…”

Section: Materials and Taskmentioning

confidence: 99%

“…Among them, one, English, is obviously the most studied language in automatic language processing and the one in which the largest number of resources is available. Hindi and, even more so, Marathi have been much less studied and are still classified as low-resource languages (Haffari et al, 2018;Ortega et al, 2021;Gaikwad et al, 2021). One can think a priori that the approach proposed here will be much more competitive in these two languages.…”

mentioning

confidence: 96%

A simple language-agnostic yet very strong baseline system for hate speech and offensive content identification

Bestgen¹

2022

Preprint

View full text Add to dashboard Cite

For automatically identifying hate speech and offensive content in tweets, a system based on a classical supervised algorithm only fed with character n-grams, and thus completely language-agnostic, is proposed by the SATLab team. After its optimization in terms of the feature weighting and the classifier parameters, it reached, in the multilingual HASOC 2021 challenge, a medium performance level in English, the language for which it is easy to develop deep learning approaches relying on many external linguistic resources, but a far better level for the two less resourced language, Hindi and Marathi. It ends even first when performances are averaged over the three tasks in these languages, outperforming many deep learning approaches. These performances suggest that it is an interesting reference level to evaluate the benefits of using more complex approaches such as deep learning or taking into account complementary resources.

show abstract

“…The primary focus of Subtask 1A on Hate speech and Offensive language identification, mainly for English, Hindi and Marathi [26], is coarse-grained binary classification. In Table 1 we have presented the dataset statistics on English and Hindi for binary classification.…”

Section: Subtask 1a: Identifying Hate Offensive and Profane Content F...mentioning

confidence: 99%

Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages

Banerjee¹,

Sarkar²,

Agrawal³

et al. 2021

Preprint

View full text Add to dashboard Cite

Hate speech is considered to be one of the major issues currently plaguing online social media. Repeated and repetitive exposure to hate speech has been shown to create physiological effects on the target users. Thus, hate speech, in all its forms, should be addressed on these platforms in order to maintain good health. In this paper, we explored several Transformer based machine learning models for the detection of hate speech and offensive content in English and Indo-Aryan languages at FIRE 2021. We explore several models such as mBERT, XLMR-large, XLMR-base by team name "Super Mario". Our models came 2𝑛𝑑 position in Code-Mixed Data set (Macro F1: 0.7107), 2𝑛𝑑 position in Hindi two-class classification (Macro F1: 0.7797), 4𝑡ℎ in English four-class category (Macro F1: 0.8006) and 12𝑡ℎ in English two-class category (Macro F1: 0.6447). We have made our code public 1 .

show abstract

Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

Cited by 27 publications

References 13 publications

Transfer language selection for zero-shot cross-lingual abusive language detection

Transfer language selection for zero-shot cross-lingual abusive language detection

A simple language-agnostic yet very strong baseline system for hate speech and offensive content identification

Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages

Contact Info

Product

Resources

About