AraBERT is an Arabic version of the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) model. The latter has achieved good performance in a variety of Natural Language Processing (NLP) tasks. In this paper, we propose an effective AraBERT embeddingsbased method for dealing with offensive Arabic language in Twitter. First, we pre-process tweets by handling emojis and including their Arabic meanings. Next, to overcome the pretrain-finetune discrepancy, we substitute each detected emojis by the special token [MASK] into both fine tuning and inference phases. Then, we represent tweets tokens by applying AraBERT model. Finally, we feed the tweet representation into a sigmoid function to decide whether a tweet is offensive or not. The proposed method achieved the best results on OffensEval 2020: Arabic task and reached a macro F1 score equal to 90.17%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.