Proceedings of the Fourteenth Workshop on Semantic Evaluation 2020
DOI: 10.18653/v1/2020.semeval-1.248
|View full text |Cite
|
Sign up to set email alerts
|

AlexU-BackTranslation-TL at SemEval-2020 Task 12: Improving Offensive Language Detection Using Data Augmentation and Transfer Learning

Abstract: Social media platforms, online news commenting spaces, and many other public forums have become widely known for issues of abusive behavior such as cyber-bullying and personal attacks. In this paper, we use the annotated tweets of Offensive Language Identification Dataset (OLID) to train three levels of deep learning classifiers to solve the three sub-tasks associated with the dataset. Sub-task A is to determine if the tweet is toxic or not. Then, for offensive tweets, sub-task B requires determining whether t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 19 publications
0
5
0
Order By: Relevance
“…Overall translation performance of NMT systems for user reviews of IMDb movies and Amazon products was explored in Lohar, Popović, and Way (2019)and Popović et al (2021). As for hate-speech detection, (Ibrahim, Torki, and El-Makky, 2020) used MT in order to balance the distribution of classes in training data. Existing English tweets were machine-translated into Portuguese (shown to be the best option), and then, these translations were translated back into English.…”
Section: Mt For User-generated Contentmentioning
confidence: 99%
“…Overall translation performance of NMT systems for user reviews of IMDb movies and Amazon products was explored in Lohar, Popović, and Way (2019)and Popović et al (2021). As for hate-speech detection, (Ibrahim, Torki, and El-Makky, 2020) used MT in order to balance the distribution of classes in training data. Existing English tweets were machine-translated into Portuguese (shown to be the best option), and then, these translations were translated back into English.…”
Section: Mt For User-generated Contentmentioning
confidence: 99%
“…In addition to some trained machine translation models, Google's Cloud Translation API service is a common tool for back-translation widely applied by some works like [7,19,59,42,60,61,10,62,63]. 8 Some works add additional features based on vanilla back-translation.…”
Section: Machine Translationmentioning
confidence: 99%
“…Text Structure classification generation prediction Paraphrasing Thesauruses [5], [93], [49], [7], [42], [60], [44], [45], [98] - [42], [43] Embeddings [8], [49] --MLMs [10], [51], [54] [55] -Rules [10], [7], [11] -[99] MT [42], [60], [10], [12], [59], [61], [63], [7], [19], [66], [100], [98] [13], [58] [42], [57], [15] Seq2Seq [18], [68], [101] [18], [102] [18], [16], [67], [17], [103], [82] Noising Swapping [93], [60], [44], [61], [20], [19] -…”
Section: Textmentioning
confidence: 99%
“…Backtranslation usually yields the same meaning sentence with alternative words and sometime different sentence structure. The authors at [ITE20] have applied back translation to balance the classes in the OLID dataset using Google Translation API, however they encountered some issues with the sentences generated from translating from the Arabic language since the back translate words was not offensive and thus effectively changed the sentence class however the technique 2 Background succeeded when back-translating from many other languages such as Spanish, German, Portuguese and Italian. This is because the quality of translation between languages are different for a given model.…”
Section: Data Augmentationmentioning
confidence: 99%