2020
DOI: 10.48550/arxiv.2005.03695
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for Multilingual Offensive Language Identification

Abstract: This paper presents our system entitled 'LIIR' for SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2). We have participated in sub-task A for English, Danish, Greek, Arabic, and Turkish languages. We adapt and fine-tune the BERT and Multilingual Bert models made available by Google AI 1 for English and non-English languages respectively. For the English language, we use a combination of two fine-tuned BERT models.For other languages we propose a cross-lingual … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 12 publications
(18 reference statements)
0
2
0
Order By: Relevance
“…While there are a few studies published on languages such as Arabic and Greek (Pitenis et al, 2020), most studies and datasets created so far include English data. Data augmentation (Ghadery and Moens, 2020) and multilingual word embeddings (Pamungkas and Patti, 2019) have been applied to take advantage of existing English resources to improve the performance in systems dealing with languages other than English. To the best of our knowledge, however, state-of-the-art cross-lingual contextual embeddings such as XLM-R (Conneau et al, 2019) have not yet been applied to offensive language identification.…”
Section: Introductionmentioning
confidence: 99%
“…While there are a few studies published on languages such as Arabic and Greek (Pitenis et al, 2020), most studies and datasets created so far include English data. Data augmentation (Ghadery and Moens, 2020) and multilingual word embeddings (Pamungkas and Patti, 2019) have been applied to take advantage of existing English resources to improve the performance in systems dealing with languages other than English. To the best of our knowledge, however, state-of-the-art cross-lingual contextual embeddings such as XLM-R (Conneau et al, 2019) have not yet been applied to offensive language identification.…”
Section: Introductionmentioning
confidence: 99%
“…More generally, propaganda detection is at the crossroad of many tasks, since it can be helped by many subtasks. Fact-checking (Aho and Ullman, 1972;Dale, 2017) can be involved with pro- paganda detection, alongside various social, tional and discursive aspects (Sileo et al, 2019a), including offensive language detection (Pradhan et al, 2020;Ghadery and Moens, 2020) emotion analysis (Dolan, 2002), computational study of persuasiveness (Guerini et al, 2008;Carlile et al, 2018) and argumentation (Palau and Moens, 2009;Habernal and Gurevych, 2016).…”
Section: Related Workmentioning
confidence: 99%