Sarcasm Detection in Twitter - Performance Impact While Using Data Augmentation: Word Embeddings

Handoyo, Alif Tri; rahman, Hidayatur; Setiadi, Criscentia Jessica; Suhartono, Derwin

doi:10.5391/ijfis.2022.22.4.401

Cited by 1 publication

(1 citation statement)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Also, augmenting by only replacing one word with the synonym could potentially generate synthetic data that is very similar to the original one, thus risking overfitting the model. In earlier works, Handoyo et al [21] augmented the data by only changing one word to its synonym. demonstrate the overfitting effects, which cause performance to decline as more augmented data are used.…”

Section: Literature Reviewmentioning

confidence: 99%

Measuring the Quality of Semantic Data Augmentation for Sarcasm Detection

2023

IJIES

View full text Add to dashboard Cite

Sarcasm is a form of figurative speech where the intended meaning of a sentence is different from it literal meaning. Sarcastic expressions tend to confuse automatic NLP approaches in many application domains, making their detection of significant importance. One of the challenges in machine learning approaches to sarcasm detection is the difficulty of acquiring ground-truth annotations. Thus, human-annotated datasets usually contain only a few thousand texts, often being unbalanced. In this paper, we propose two different pipelines of data augmentation to generate more sarcastic data. The first one is SMERT-BERT, a modified SMERTI pipeline that uses RoBERTa as the language model for the text infilling module. The second one is SWORD (semantic text exchange by Word-Attribution), where we modified the masking module in the SMERTI pipeline by utilizing the word-attribution value. These approaches are combined with a SLOR (syntactic log-odds ratio) metric to filter the generated sarcastic data and only select sentences with the best score. Our experiments show that the use of a SLOR filter has a significant positive contribution to the augmentation process. In particular, we achieve the best results when using the SMERT-BERT pipeline and a SLOR filter by improving the F-measure by 4.00% on the iSarcasm dataset, compared to the baseline models.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Measuring the Quality of Semantic Data Augmentation for Sarcasm Detection

2023

IJIES

View full text Add to dashboard Cite

show abstract

Sarcasm Detection in Twitter - Performance Impact While Using Data Augmentation: Word Embeddings

Cited by 1 publication

References 22 publications

Measuring the Quality of Semantic Data Augmentation for Sarcasm Detection

Measuring the Quality of Semantic Data Augmentation for Sarcasm Detection

Contact Info

Product

Resources

About