Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue 2016
DOI: 10.18653/v1/w16-3604
|View full text |Cite
|
Sign up to set email alerts
|

Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue

Abstract: The use of irony and sarcasm in social media allows us to study them at scale for the first time. However, their diversity has made it difficult to construct a high-quality corpus of sarcasm in dialogue. Here, we describe the process of creating a largescale, highly-diverse corpus of online debate forums dialogue, and our novel methods for operationalizing classes of sarcasm in the form of rhetorical questions and hyperbole. We show that we can use lexico-syntactic cues to reliably retrieve sarcastic utterance… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

4
86
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 67 publications
(95 citation statements)
references
References 24 publications
4
86
0
Order By: Relevance
“…We note that this increase in the SARC class from plain word embeddings to word embeddings combined with LIWC and context is larger than the increase in the OTHER class, indicating that post-level context for SARC captures more diverse instances in training. We also note that these results beat our previous baselines using only ngram features on the smaller original dataset of 851 posts per class (0.70 F1 for SARC, 0.71 F1 for NOT-SARC) (Oraby et al, 2016).We investigate why certain context features benefit each class differently for LSTM. Table 7 shows examples of single posts, divided into P re, RQ, and P ost.…”
supporting
confidence: 50%
See 4 more Smart Citations
“…We note that this increase in the SARC class from plain word embeddings to word embeddings combined with LIWC and context is larger than the increase in the OTHER class, indicating that post-level context for SARC captures more diverse instances in training. We also note that these results beat our previous baselines using only ngram features on the smaller original dataset of 851 posts per class (0.70 F1 for SARC, 0.71 F1 for NOT-SARC) (Oraby et al, 2016).We investigate why certain context features benefit each class differently for LSTM. Table 7 shows examples of single posts, divided into P re, RQ, and P ost.…”
supporting
confidence: 50%
“…The final gold sarcastic label was assigned only if a majority of the annotators labeled the reply as sarcastic. Although the dataset described by Oraby et al (2016) consists of 9,400 post, only 2 https://github.com/debanjanghosh/sarcasm context 50% (4,692 altogether; balanced between sarcastic and non-sarcastic categories) of that corpus is currently available for research. 3 An example from this dataset is given in Table 1, where userD's reply has been labeled as sarcastic by annotators, in the context of userC's post/comment.…”
mentioning
confidence: 99%
See 3 more Smart Citations