In this paper, we present a set of computational methods to identify the likeliness of a word being borrowed, based on the signals from social media. In terms of Spearman's correlation values, our methods perform more than two times better (∼ 0.62) in predicting the borrowing likeliness compared to the best performing baseline (∼ 0.26) reported in literature. Based on this likeliness estimate we asked annotators to re-annotate the language tags of foreign words in predominantly native contexts. In 88% of cases the annotators felt that the foreign language tag should be replaced by native language tag, thus indicating a huge scope for improvement of automatic language identification systems.
In this paper we propose a deep learning framework for sarcasm target detection in predefined sarcastic texts. Identification of sarcasm targets can help in many core natural language processing tasks such as aspect based sentiment analysis, opinion mining etc. To begin with, we perform an empirical study of the socio-linguistic features and identify those that are statistically significant in indicating sarcasm targets (p-values in the range (0.05, 0.001)). Finally, we present a deeplearning framework augmented with sociolinguistic features to detect sarcasm targets in sarcastic book-snippets and tweets. We achieve a huge improvement in the performance in terms of exact match and dice score as compared to the current state-of-the-art baseline.
In this paper we demonstrate how codeswitching patterns can be utilised to improve various downstream NLP applications. In particular, we encode different switching features to improve humour, sarcasm and hate speech detection tasks. We believe that this simple linguistic observation can also be potentially helpful in improving other similar NLP applications.
categories like 'bio', 'health', 'body' and 'negative emotion' are more pronounced in the tweets posted by the users in the latter class. As a final step we use these observations as features and automatically classify the two groups achieving an F1-score of 0.83.
Fake news is a serious problem, which has received considerable attention from both industry and academic communities. Over the past years, many fake news detection approaches have been introduced, and most of the existing methods rely on either news content or the social context of the news dissemination process on social media platforms. In this work, we propose a generic model that is able to take into account both the news content and the social context for the identification of fake news. Specifically, we explore different aspects of the news content by using both shallow and deep representations. The shallow representations are produced with word2vec and doc2vec models while the deep representations are generated via transformer-based models. These representations are able to jointly or separately address four individual tasks, namely bias detection, clickbait detection, sentiment analysis, and toxicity detection. In addition, we make use of graph convolutional neural networks and mean-field layers in order to exploit the underlying structural information of the news articles. That way, we are able to take into account the inherent correlation between the articles by leveraging their social context information. Experiments on widely-used benchmark datasets indicate the effectiveness of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.