Sarcasm is the use of words commonly used to ridicule someone or for humorous purposes. Several studies on sarcasm detection have utilized different learning algorithms. However, most of these learning models have always focused on the contents of expression only, thus leaving the contextual information in isolation. As a result, they failed to capture the contextual information in the sarcastic expression. Moreover, some datasets used in several studies have an unbalanced dataset, thus impacting the model result. In this paper, we propose a contextual model for sarcasm identification in Twitter using various pre-trained models and augmenting the dataset by applying Global Vector representation (GloVe) for the construction of word embedding and context learning to generate more sarcastic data, and also perform additional experiments by using the data duplication method. Data augmentation and duplication impact is tested in various datasets and augmentation sizes. In particular, we achieved the best performance after using the data augmentation method to increase 20% of the data labeled as sarcastic and improve the performance by 2.1% with an F1 Score of 40.44% compared to 38.34% before using data augmentation in the iSarcasm dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.