Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Me 2021
DOI: 10.26615/978-954-452-072-4_050
|View full text |Cite
|
Sign up to set email alerts
|

Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(23 citation statements)
references
References 13 publications
1
19
0
Order By: Relevance
“…They also discovered that using unlabeled samples from the target language can be used to increase performance. Finally, Gaikwad et al [19] noticed that transfer learning from Hindi outperformed other languages when classifying entries in Marathi, suggesting a relation between cross-lingual transfer performance and language similarity.…”
Section: Abusive Language Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…They also discovered that using unlabeled samples from the target language can be used to increase performance. Finally, Gaikwad et al [19] noticed that transfer learning from Hindi outperformed other languages when classifying entries in Marathi, suggesting a relation between cross-lingual transfer performance and language similarity.…”
Section: Abusive Language Detectionmentioning
confidence: 99%
“…To get around this problem, it has been shown that with cross-lingual transfer, the performance on lowresource languages can be improved by leveraging knowledge from other higher resource languages. This has also been demonstrated to be an e ective technique in improving o ensive content detection in low resource languages by using cross-lingual word embeddings and multilingual transformer models [16,17,18,19].…”
Section: Introductionmentioning
confidence: 99%
“…The SATLab participated in subtask 1 of the HASOC 2021 shared task "Hate Speech and Offensive Content Identification in English and For each language, learning and test materials have been provided by the task organizers Gaikwad et al, 2021). The frequencies (#) and percentages (%) in each category of each problem for each language are given in Table 1.…”
Section: Materials and Taskmentioning
confidence: 99%
“…Among them, one, English, is obviously the most studied language in automatic language processing and the one in which the largest number of resources is available. Hindi and, even more so, Marathi have been much less studied and are still classified as low-resource languages (Haffari et al, 2018;Ortega et al, 2021;Gaikwad et al, 2021). One can think a priori that the approach proposed here will be much more competitive in these two languages.…”
mentioning
confidence: 96%
“…The primary focus of Subtask 1A on Hate speech and Offensive language identification, mainly for English, Hindi and Marathi [26], is coarse-grained binary classification. In Table 1 we have presented the dataset statistics on English and Hindi for binary classification.…”
Section: Subtask 1a: Identifying Hate Offensive and Profane Content F...mentioning
confidence: 99%