Code-Switching Patterns Can Be an Effective Route to Improve Performance of Downstream NLP Applications: A Case Study of Humour, Sarcasm and Hate Speech Detection

Bansal, Srijan; Garimella, Vishal; Suhane, Ayush; Patro, Jasabanta; Mukherjee, Animesh

doi:10.18653/v1/2020.acl-main.96

Cited by 13 publications

(6 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent literature has made significant efforts to understand syntactic structure and semantics from code-mixed texts [3,4,5]. Similar attempts have been made for pragmatic tasks -humour, sarcasm and hate detection in the code-mixed regime [6,7].…”

Section: Introductionmentioning

confidence: 94%

“…: We perform a human evaluation study to evaluate the code-mixed texts generated by PARADOX and the vanilla Transformer. We randomly sample 24 examples from each of these models and ask 30 human evaluators 7 to rate these examples based on Semantic coherence and Linguistic quality. Semantic coherence measures the meaningfulness of the code-mixed texts, whereas, with linguistic quality, we measure their structural validity.…”

Section: Comparative Analysismentioning

confidence: 99%

See 1 more Smart Citation

Persona-Aware Generative Model for Code-Mixed Language

Sengupta,

Akhtar,

Chakraborty

2023

Preprint

View full text Add to dashboard Cite

Section: Introductionmentioning

confidence: 94%

Section: Comparative Analysismentioning

confidence: 99%

Persona-Aware Generative Model for Code-Mixed Language

Sengupta,

Akhtar,

Chakraborty

2023

Preprint

View full text Add to dashboard Cite

“…This can be combated by collecting data from platforms with a primary Urdu-speaking population, or by considering the conversational data between people who communicate in Urdu with each other, as may be the case of native Urdu-speaking families. Alternatively, a modern Deep Neural Model, using datasets based on theme, was used by Bansal et al (2020) to improve upon the accuracy achieved by other models. The research presented a set of nine features, which could be further improved by the use of switching features in the final layer of the Deep Network.…”

Section: Code-switchingmentioning

confidence: 99%

Code-Switching By Multilingual Pakistanis On Twitter: A Qualitative Analysis

Jamali

Rasool

Batool

2022

KJHSS

View full text Add to dashboard Cite

Code-switching is practicing two different grammatical systems where multilinguals also move between two languages or between two dialects or registers of the same language. In the current article, code-switching is characterized as the simultaneous use of two or more languages or dialects within a conversation. The current study focused on code-switching practices on the social media website Twitter. While posting on Twitter, multilinguals may use several languages. The aim of this study was to describe and analyze code-switched Tweets for any recurring patterns and practices. The population of this study involved Twitter users living in the Rawalpindi-Islamabad area of Pakistan. The Tweets were collected on the basis of time and location through a random cluster sampling method. A qualitative analysis of the individual Tweets was done, and recurring patterns were pointed out. This was purely observational research. It was found that the sampled Tweets only code-switched between Urdu and English. Code-switching at the intra-sentential level was more common than at the inter-sentential level. Code-switching at the level of clauses was the most common form of intra-sentential code-switching. Over half of the inter-sentential code-switching had the English sentence(s) preceding the Urdu sentence(s). The findings suggest that code-switching between English and Urdu occurs more commonly at the intra-sentential level. They further imply that the population generally prefers to start inter-sentential code-switching with English before code-switching to Urdu. The results of this study may be useful in demystifying the phenomenon of code-switching in online spaces.

show abstract

“…We recommend to specifically investigate how to integrate these characteristics into future models. For instance, Alorainy et al [9] extract features specifically to identify othering language, Bansal et al [26] and recent publications in ACL workshops [19] focus on humor and sarcasm. Discriminatory features.…”

Section: Lack Of Ocl-dependent Featuresmentioning

confidence: 99%

Automatic Identification of Harmful, Aggressive, Abusive, and Offensive Language on the Web: A Survey of Technical Biases Informed by Psychology Literature

Balayn

Yang

Szlávik

et al. 2021

Trans. Soc. Comput.

View full text Add to dashboard Cite

The automatic detection of conflictual languages (harmful, aggressive, abusive, and offensive languages) is essential to provide a healthy conversation environment on the Web. To design and develop detection systems that are capable of achieving satisfactory performance, a thorough understanding of the nature and properties of the targeted type of conflictual language is of great importance. The scientific communities investigating human psychology and social behavior have studied these languages in details, but their insights have only partially reached the computer science community. In this survey, we aim both at systematically characterizing the conceptual properties of online conflictual languages, and at investigating the extent to which they are reflected in state-of-the-art automatic detection systems. Through an analysis of psychology literature, we provide a reconciled taxonomy that denotes the ensemble of conflictual languages typically studied in computer science. We then characterize the conceptual mismatches that can be observed in the main semantic and contextual properties of these languages and their treatment in computer science works; and systematically uncover resulting technical biases in the design of machine learning classification models and the dataset created for their training. Finally, we discuss diverse research opportunities for the computer science community and reflect on broader technical and structural issues.

show abstract

Code-Switching Patterns Can Be an Effective Route to Improve Performance of Downstream NLP Applications: A Case Study of Humour, Sarcasm and Hate Speech Detection

Cited by 13 publications

References 9 publications

Persona-Aware Generative Model for Code-Mixed Language

Persona-Aware Generative Model for Code-Mixed Language

Code-Switching By Multilingual Pakistanis On Twitter: A Qualitative Analysis

Automatic Identification of Harmful, Aggressive, Abusive, and Offensive Language on the Web: A Survey of Technical Biases Informed by Psychology Literature

Contact Info

Product

Resources

About