“…Karimi et al (2018) utilizes LSTM networks to capture temporal dependencies to detect compromised accounts. VanDam et al (2018) uses an unsupervised learning framework, where multiple views on a user profile (i.e., term, source, time and place) are encoded separately and then mapped into a joint space. This joint representation is then used to retrieve a ranking of compromised accounts.…”
Suspended accounts are high-risk accounts that violate the rules of a social network. These accounts contain spam, offensive and explicit language, among others, and are incredibly variable in terms of textual content. In this work, we perform a detailed linguistic and statistical analysis into the textual information of suspended accounts and show how insights from our study significantly improve a deep-learning-based detection framework. Moreover, we investigate the utility of advanced topic modeling for the automatic creation of word lists that can discriminate suspended from regular accounts. Since early detection of these high-risk accounts is crucial, we evaluate multiple state-of-the-art classification models along the temporal dimension by measuring the minimum amount of textual signal needed to perform reliable predictions. Further, we show that the best performing models are able to detect suspended accounts earlier than the social media platform.
“…Karimi et al (2018) utilizes LSTM networks to capture temporal dependencies to detect compromised accounts. VanDam et al (2018) uses an unsupervised learning framework, where multiple views on a user profile (i.e., term, source, time and place) are encoded separately and then mapped into a joint space. This joint representation is then used to retrieve a ranking of compromised accounts.…”
Suspended accounts are high-risk accounts that violate the rules of a social network. These accounts contain spam, offensive and explicit language, among others, and are incredibly variable in terms of textual content. In this work, we perform a detailed linguistic and statistical analysis into the textual information of suspended accounts and show how insights from our study significantly improve a deep-learning-based detection framework. Moreover, we investigate the utility of advanced topic modeling for the automatic creation of word lists that can discriminate suspended from regular accounts. Since early detection of these high-risk accounts is crucial, we evaluate multiple state-of-the-art classification models along the temporal dimension by measuring the minimum amount of textual signal needed to perform reliable predictions. Further, we show that the best performing models are able to detect suspended accounts earlier than the social media platform.
“…The commonly used features include raw features, such as word vector, word embedding, hashtags, links and URLs [119]. Advanced features include deep content features, statistics, LIWC and other metadata, such as location, source, or time [193]. Most ML-based models use supervised learning.…”
Section: Pros and Consmentioning
confidence: 99%
“…• Precision [10,17,21,28,40,50,61,78,82,88,89,91,102,107,113,115,135,162,166,175,186,193,217,219,224]: This metric simply estimates the true positives over positives detected including true positives and false positives by:…”
We are living in an era when online communication over social network services (SNSs) have become an indispensable part of people's everyday lives. As a consequence, online social deception (OSD) in SNSs has emerged as a serious threat in cyberspace, particularly for users vulnerable to such cyberattacks. Cyber attackers have exploited the sophisticated features of SNSs to carry out harmful OSD activities, such as financial fraud, privacy threat, or sexual/labor exploitation. Therefore, it is critical to understand OSD and develop effective countermeasures against OSD for building trustworthy SNSs. In this paper, we conduct an extensive survey, covering (i) the multidisciplinary concept of social deception; (ii) types of OSD attacks and their unique characteristics compared to other social network attacks and cybercrimes; (iii) comprehensive defense mechanisms embracing prevention, detection, and response (or mitigation) against OSD attacks along with their pros and cons; (iv) datasets/metrics used for validation and verification; and (v) legal and ethical concerns related to OSD research. Based on this survey, we provide insights into the effectiveness of countermeasures and the lessons learned from the existing literature. We conclude our survey with in-depth discussions on the limitations of the state-of-the-art and suggest future research directions in OSD research.
“…It addressed the sparsity problem by defining and employing a user context representation. The study of Vandam et al [27] combined multiple modalities of the data at the user level to detect compromised accounts. They considered this method by four modalities: source, timing, location, and textual content.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.