In this world of information and experience era, microblogging sites have been commonly used to express people feelings including fear, panic, hate and abuse. Monitoring and control of abuse on social media, especially during pandemics such as COVID-19, can help in keeping the public sentiment and morale positive. Developing the fear and hate detection methods based on machine learning requires labelled data. However, obtaining the labelled data in suddenly changed circumstances as a pandemic is expensive and acquiring them in a short time is impractical. Related labelled hate data from other domains or previous incidents may be available. However, the predictive accuracy of these hate detection models decreases significantly if the data distribution of the target domain, where the prediction will be applied, is different. To address this problem, we propose a novel concept of unsupervised progressive domain adaptation based on a deep-learning language model generated through multiple text datasets. We showcase the efficacy of the proposed method in hate speech and fear detection on the tweets collection during COVID-19 where the labelled information is unavailable.
Social media platforms like Twitter have become an easy portal for billions of people to connect and exchange their thoughts. Unfortunately, people commonly use these platforms to share misinformation which can influence other people adversely. The spread of misinformation is unavoidable in an extraordinary situation like Covid-19, and the consequences can be dreadful. This paper proposes a two-step ranking-based misinformation detection (RMiD) technique. Firstly, a novel ranking-based approach leveraging the scalable information retrieval infrastructure is applied to detect misinformation from a huge collection of unlabelled tweets based on a related but very small labelled misinformation data set. Secondly, the identified misinformation tweets are represented as a coupled matrix tensor model and Nonnegative Coupled Matrix Tensor Factorization is applied to learn their spatio-temporal topic dynamics. The experimental analysis shows that RMiD is capable of detecting misinformation with better coverage and less noise in comparison with existing techniques. Moreover, the coupled matrix tensor representation has improved the quality of topics discovered from unlabelled data up to 4% by leveraging the semantic similarity of terms in labelled data.
Supplementary Information
The online version supplementary material available at 10.1007/s13278-021-00767-7.
Clustering on the data with multiple aspects, such as multi-view or multi-type relational data, has become popular in recent years due to their wide applicability. The approach using manifold learning with the Non-negative Matrix Factorization (NMF) framework, that learns the accurate low-rank representation of the multi-dimensional data, has shown effectiveness. We propose to include the inter-manifold in the NMF framework, utilizing the distance information of data points of different data types (or views) to learn the diverse manifold for data clustering. Empirical analysis reveals that the proposed method can find partial representations of various interrelated types and select useful features during clustering. Results on several datasets demonstrate that the proposed method outperforms the state-of-the-art multi-aspect data clustering methods in both accuracy and efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.