Cybercriminals have leveraged the popularity of a large user base available on Online Social Networks (OSNs) to spread spam campaigns by propagating phishing URLs, attaching malicious contents, etc. However, another kind of spam attacks using phone numbers has recently become prevalent on OSNs, where spammers advertise phone numbers to attract users' attention and convince them to make a call to these phone numbers. The dynamics of phone number based spam is different from URL-based spam due to an inherent trust associated with a phone number. While previous work has proposed strategies to mitigate URL-based spam attacks, phone number based spam attacks have received less attention.In this paper, we aim to detect spammers that use phone numbers to promote campaigns on Twitter. To this end, we collected information (tweets, user meta-data, etc.) about 3, 370 campaigns spread by 670, 251 users. We model the Twitter dataset as a heterogeneous network by leveraging various interconnections between different types of nodes present in the dataset. In particular, we make the following contributions -(i) We propose a simple yet effective metric, called Hierarchical Meta-Path Score (HMPS) to measure the proximity of an unknown user to the other known pool of spammers. (ii) We design a feedback-based active learning strategy and show that it significantly outperforms three state-of-the-art baselines for the task of spam detection. Our method achieves 6.9% and 67.3% higher F1-score and AUC, respectively compared to the best baseline method. (iii) To overcome the problem of less training instances for supervised learning, we show that our proposed feedback strategy achieves 25.6% and 46% higher F1-score and AUC respectively than other oversampling strategies. Finally, we perform a case study to show how our method is capable of detecting those users as spammers who have not been suspended by Twitter (and other baselines) yet.
CCS CONCEPTS• Information systems; • Security and privacy; • Applied computing; This paper is published under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.