Sarcasm is an intricate form of speech, where meaning is conveyed implicitly. Being a convoluted form of expression, detecting sarcasm is an assiduous problem. The difficulty in recognition of sarcasm has many pitfalls, including misunderstandings in everyday communications, which leads us to an increasing focus on automated sarcasm detection. In the second edition of the Figurative Language Processing (FigLang 2020) workshop, the shared task of sarcasm detection released two datasets, containing responses along with their context sampled from Twitter and Reddit.In this work, we use RoBERT a large to detect sarcasm in both the datasets. We further assert the importance of context in improving the performance of contextual word embedding based models by using three different types of inputs -Response-only, Context-Response, and Context-Response (Separated). We show that our proposed architecture performs competitively for both the datasets. We also show that the addition of a separation token between context and target response results in an improvement of 5.13% in the F1-score in the Reddit dataset.
With the growing use of social media and its availability, many instances of the use of offensive language have been observed across multiple languages and domains. This phenomenon has given rise to the growing need to detect the offensive language used in social media crosslingually. In OffensEval 2020, the organizers have released the multilingual Offensive Language Identification Dataset (mOLID), which contains tweets in five different languages, to detect offensive language. In this work, we introduce a cross-lingual inductive approach to identify the offensive language in tweets using the contextual word embedding XLM-RoBERTa (XLM-R). We show that our model performs competitively on all five languages, obtaining the fourth position in the English task with an F1-score of 0.919 and eighth position in the Turkish task with an F1-score of 0.781. Further experimentation proves that our model works competitively in a zero-shot learning environment, and is extensible to other languages.
Computing with words is a concept that is used to solve problems with input in natural language. Modifiers are transformation functions with predefined labels used extensively in decision-making to specify the desired value of a linguistic variable defined by fuzzy sets. In past years, few efforts have been made to study the application of Computing with Words (CW) in many domains ranging from fraud detection systems to diagnosis systems in medicine. However, the application of CW in these fields with modified Fuzzy sets did not give satisfactory results. When applied to modified Fuzzy sets, the existing interpolation rule does not cover the extreme left and extreme right-shifted fuzzy sets. Hence, there is a need to introduce a new interpolation rule when working with modifiers. This paper introduces a new Fuzzy Modifier Interpolation Rule to Type-1 Fuzzy sets and Interval Type-2 (IT-2) Fuzzy Sets to enhance the quality of results obtained when modifiers are applied.Povzetek: V prispevku je predstavljeno novo interpolacijsko pravilo za mehke modifikatorje v mehkih nizih tipa 1 in intervalnega tipa 2.
Text simplification is the process of splitting and rephrasing a sentence to a sequence of sentences making it easier to read and understand while preserving the content and approximating the original meaning. Text simplification has been exploited in NLP applications like machine translation, summarization, semantic role labeling, and information extraction, opening a broad avenue for its exploitation in comprehension-based questionanswering downstream tasks. In this work, we investigate the effect of text simplification in the task of question-answering using a comprehension context. We release Simple-SQuAD, a simplified version of the widely-used SQuAD dataset.Firstly, we outline each step in the dataset creation pipeline, including style transfer, thresholding of sentences showing correct transfer, and offset finding for each answer. Secondly, we verify the quality of the transferred sentences through various methodologies involving both automated and human evaluation. Thirdly, we benchmark the newly created corpus and perform an ablation study for examining the effect of the simplification process in the SQuAD-based question answering task. Our experiments show that simplification leads to up to 2.04% and 1.74% increase in Exact Match and F1, respectively. Finally, we conclude with an analysis of the transfer process, investigating the types of edits made by the model, and the effect of sentence length on the transfer model.
The GAP dataset is a Wikipedia-based evaluation dataset for gender bias detection in coreference resolution, containing mostly objective sentences. Since subjectivity is ubiquitous in our daily texts, it becomes necessary to evaluate models for both subjective and objective instances. In this work, we present a new evaluation dataset for gender bias in coreference resolution, GAP-Subjective, which increases the coverage of the original GAP dataset by including subjective sentences. We outline the methodology used to create this dataset. Firstly, we detect objective sentences and transfer them into their subjective variants using a sequenceto-sequence model. Secondly, we outline the thresholding techniques based on fluency and content preservation to maintain the quality of the sentences. Thirdly, we perform automated and human-based analysis of the style transfer and infer that the transferred sentences are of high quality. Finally, we benchmark both GAP and GAP-Subjective datasets using a BERTbased model and analyze its predictive performance and gender bias.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.