Proceedings of the Fourteenth Workshop on Semantic Evaluation 2020
DOI: 10.18653/v1/2020.semeval-1.274
|View full text |Cite
|
Sign up to set email alerts
|

LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for Multilingual Offensive Language Identification

Abstract: This paper presents our system entitled 'LIIR' for SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2). We have participated in Subtask A for English, Danish, Greek, Arabic, and Turkish languages. We adapt and fine-tune the BERT and multilingual Bert models made available by Google AI 1 for English and non-English languages respectively. For the English language, we use a combination of two fine-tuned BERT models. For other languages, we propose a cross-lingual… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 20 publications
0
1
0
Order By: Relevance
“…As a result, many studies employed Pre-trained multilingual word embeddings like FastText ( Bigoulaeva, Hangya & Fraser, 2021 ), MUSE ( Pamungkas & Patti, 2019 ; Deshpande, Farris & Kumar, 2022 ; Aluru et al, 2020 ; Bigoulaeva, Hangya & Fraser, 2021 ), or LASER ( Deshpande, Farris & Kumar, 2022 , Aluru et al, 2020 , Pelicon et al, 2021a ), and Vitiugin, Senarath & Purohit (2021) . Moreover, most of the research studies has focused on the use of pre-trained language models LLMs (basically as classifiers): BERT ( Vashistha & Zubiaga, 2021 , zahra El-Alami, Ouatik El Alaoui & En Nahnahi, 2022 ; Zia et al, 2022 ; Pamungkas, Basile & Patti, 2021a ), AraBERT (for Arabic data) ( zahra El-Alami, Ouatik El Alaoui & En Nahnahi, 2022 ), CseBERT (for English, Croatian and Slovenian data) ( Pelicon et al, 2021b ), as well as multilingual BERT models: ( Shi et al, 2022 ; Bhatia et al, 2021 ; Deshpande, Farris & Kumar, 2022 ; Aluru et al, 2020 ; zahra El-Alami, Ouatik El Alaoui & En Nahnahi, 2022 ; De la Peña Sarracén & Rosso, 2022 ; Tita & Zubiaga, 2021 ; Eronen et al, 2022 ; Ranasinghe & Zampieri, 2021a ; Ghadery & Moens, 2020 ; Pelicon et al, 2021b ; Awal et al, 2024 ; Montariol, Riabi & Seddah, 2022 ; Ahn et al, 2020a ; Bigoulaeva et al, 2022 , 2023 ; Pamungkas, Basile & Patti, 2021a ; Pelicon et al, 2021a ), DistilmBERT model ( Vitiugin, Senarath & Purohit, 2021 ), and RoBERTa ( Zia et al, 2022 ).…”
Section: Approaches On Multilingual Hate Speech Detectionmentioning
confidence: 99%
“…As a result, many studies employed Pre-trained multilingual word embeddings like FastText ( Bigoulaeva, Hangya & Fraser, 2021 ), MUSE ( Pamungkas & Patti, 2019 ; Deshpande, Farris & Kumar, 2022 ; Aluru et al, 2020 ; Bigoulaeva, Hangya & Fraser, 2021 ), or LASER ( Deshpande, Farris & Kumar, 2022 , Aluru et al, 2020 , Pelicon et al, 2021a ), and Vitiugin, Senarath & Purohit (2021) . Moreover, most of the research studies has focused on the use of pre-trained language models LLMs (basically as classifiers): BERT ( Vashistha & Zubiaga, 2021 , zahra El-Alami, Ouatik El Alaoui & En Nahnahi, 2022 ; Zia et al, 2022 ; Pamungkas, Basile & Patti, 2021a ), AraBERT (for Arabic data) ( zahra El-Alami, Ouatik El Alaoui & En Nahnahi, 2022 ), CseBERT (for English, Croatian and Slovenian data) ( Pelicon et al, 2021b ), as well as multilingual BERT models: ( Shi et al, 2022 ; Bhatia et al, 2021 ; Deshpande, Farris & Kumar, 2022 ; Aluru et al, 2020 ; zahra El-Alami, Ouatik El Alaoui & En Nahnahi, 2022 ; De la Peña Sarracén & Rosso, 2022 ; Tita & Zubiaga, 2021 ; Eronen et al, 2022 ; Ranasinghe & Zampieri, 2021a ; Ghadery & Moens, 2020 ; Pelicon et al, 2021b ; Awal et al, 2024 ; Montariol, Riabi & Seddah, 2022 ; Ahn et al, 2020a ; Bigoulaeva et al, 2022 , 2023 ; Pamungkas, Basile & Patti, 2021a ; Pelicon et al, 2021a ), DistilmBERT model ( Vitiugin, Senarath & Purohit, 2021 ), and RoBERTa ( Zia et al, 2022 ).…”
Section: Approaches On Multilingual Hate Speech Detectionmentioning
confidence: 99%
“…While there are a few studies published on languages such as Arabic [29] and Greek [35], most studies and datasets created thus far have focused on English. Data augmentation [15] and multilingual word embeddings [31] have been applied to take advantage of existing English datasets to improve the performance in systems dealing with languages other than English. To the best of our knowledge, however, state-of-the-art cross-lingual contextual embeddings such as XLM-R [11] have not yet been applied to offensive language identification.…”
Section: Introductionmentioning
confidence: 99%