Alex Hai Wang scite author profile

International audienceAs online social networking sites become more and more popular, they have also attracted the attentions of the spammers. In this paper, Twitter, a popular micro-blogging service, is studied as an example of spam bots detection in online social networking sites. A machine learning approach is proposed to distinguish the spam bots from normal ones. To facilitate the spam bots detection, three graph-based features, such as the number of friends and the number of followers, are extracted to explore the unique follower and friend relationships among users on Twitter. Three content-based features are also extracted from user's most recent 20 tweets. A real data set is collected from Twitter's public available information using two different methods. Evaluation experiments show that the detection system is efficient and accurate to identify spam bots in Twitter

show abstract

Machine Learning for the Detection of Spam in Twitter Networks

Wang

2012

View full text Add to dashboard Cite

DON’T FOLLOW ME - Spam Detection in Twitter

Wang

2010

View full text Add to dashboard Cite

The rapidly growing social network Twitter has been infiltrated by large amount of spam. In this paper, a spam detection prototype system is proposed to identify suspicious users on Twitter. A directed social graph model is proposed to explore the "follower" and "friend" relationships among Twitter. Based on Twitter's spam policy, novel content-based features and graph-based features are also proposed to facilitate spam detection. A Web crawler is developed relying on API methods provided by Twitter. Around 25K users, 500K tweets, and 49M follower/friend relationships in total are collected from public available data on Twitter. Bayesian classification algorithm is applied to distinguish the suspicious behaviors from normal ones. I analyze the data set and evaluate the performance of the detection system. Classic evaluation metrics are used to compare the performance of various traditional classification methods. Experiment results show that the Bayesian classifier has the best overall performance in term of F-measure. The trained classifier is also applied to the entire data set. The result shows that the spam detection system can achieve 89% precision.

show abstract

Costco: Robust Content and Structure Constrained Clustering of Networked Documents

Lee

Wang

2011

View full text Add to dashboard Cite

Abstract. Connectivity analysis of networked documents provides high quality link structure information, which is usually lost upon a contentbased learning system. It is well known that combining links and content has the potential to improve text analysis. However, exploiting link structure is non-trivial because links are often noisy and sparse. Besides, it is difficult to balance the term-based content analysis and the link-based structure analysis to reap the benefit of both. We introduce a novel networked document clustering technique that integrates the content and link information in a unified optimization framework. Under this framework, a novel dimensionality reduction method called COntent & STructure COnstrained (Costco) Feature Projection is developed. In order to extract robust link information from sparse and noisy link graphs, two link analysis methods are introduced. Experiments on benchmark data and diverse real-world text corpora validate the effectiveness of proposed methods.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alex Hai Wang

Twitter spammer detection using data stream clustering

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Machine Learning for the Detection of Spam in Twitter Networks

DON’T FOLLOW ME - Spam Detection in Twitter

Costco: Robust Content and Structure Constrained Clustering of Networked Documents

Contact Info

Product

Resources

About