6 million spam tweets: A large ground truth for timely Twitter spam detection

Chen, Chao; Zhang, Jun; Chen, Xiao; Xiang, Yang; Zhou, Wanlei

doi:10.1109/icc.2015.7249453

Cited by 106 publications

(76 citation statements)

References 13 publications

(38 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Gao et al [82] propose a tweet-based spam detection approach based on the social degree of the tweet's sender, the history of interaction, the size of the cluster, the average time interval, the average number of URL in tweets, and the unique number of URL in tweets. Chen et al [83] present a real-time spam detection method for Twitter based on 12 lightweight features which are extracted from a dataset contains 6.5 million spam tweets. The features they consider detecting spam on Twitter are age of the account, the number of followers, the number of following, the number of likes the account received, the number of the account's lists, the number of tweets of the account, the number of retweets of the tweet, the number of hashtags used in the tweet, the number of mentioned users in the tweet, the number of URLs used in the tweet, the number of characters used in the tweet, and the number of digits used in the tweet.…”

Section: Hybrid Spam Detection Methodsmentioning

confidence: 99%

A Survey of Spam Detection Methods on Twitter

Kabakuş¹,

Kara²

2017

ijacsa

View full text Add to dashboard Cite

Section: Hybrid Spam Detection Methodsmentioning

confidence: 99%

A Survey of Spam Detection Methods on Twitter

Kabakuş¹,

Kara²

2017

ijacsa

View full text Add to dashboard Cite

“…These can then be deployed directly into a machine learning algorithm. Table I shows examples of common features that have been used in previous studies [21] [11].…”

Section: Related Workmentioning

confidence: 99%

“…They do this by conducting spam campaigns that make their "fake" accounts connect with other fake accounts, increasing the follower and following numbers [22]. The majority of previous studies [23][24] [21] begin by collecting data using the Twitter streaming API 3 . Multiple features are then extracted and different feature sets utilized.…”

Section: Related Workmentioning

confidence: 99%

“…Similar to how security researchers study the attacks, spammers and hackers investigate detection systems; therefore, they can change user properties, content or the distribution mechanism to bypass certain restriction or detection rules [20]. For example, a study of detecting spam on Twitter [21] recommended that the number of followers is one of the highest discriminative power features. The feature's discriminative power has been increasingly weakened though by spammers making their accounts more popular.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Using supervised machine learning algorithms to detect suspicious URLs in online social networks

Al-Janabi

Quincey

András

2017

Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017

View full text Add to dashboard Cite

Abstract-The increasing volume of malicious content in social networks requires automated methods to detect and eliminate such content. This paper describes a supervised machine learning classification model that has been built to detect the distribution of malicious content in online social networks (ONSs). Multisource features have been used to detect social network posts that contain malicious Uniform Resource Locators (URLs). These URLs could direct users to websites that contain malicious content, drive-by download attacks, phishing, spam, and scams. For the data collection stage, the Twitter streaming application programming interface (API) was used and VirusTotal was used for labelling the dataset. A random forest classification model was used with a combination of features derived from a range of sources. The random forest model without any tuning and feature selection produced a recall value of 0.89. After further investigation and applying parameter tuning and feature selection methods, however, we were able to improve the classifier performance to 0.92 in recall.

show abstract

“…Gawale and Patil implemented a system to detect malicious URLs on Twitter (Gawale and Patil, 2015). Chen et al (Chen et al, 2015) evaluated the ability of spam detection of various machine learning algorithms. They found that other classifiers tend to outperform Naive Bayes and SVM on spam detection.…”

Section: Related Workmentioning

confidence: 99%

Detecting Hacked Twitter Accounts based on Behavioural Change

Nauta

Habib

Keulen

2017

Proceedings of the 13th International Conference on Web Information Systems and Technologies

View full text Add to dashboard Cite

Abstract:Social media accounts are valuable for hackers for spreading phishing links, malware and spam. Furthermore, some people deliberately hack an acquaintance to damage his or her image. This paper describes a classification for detecting hacked Twitter accounts. The model is mainly based on features associated with behavioural change such as changes in language, source, URLs, retweets, frequency and time. We experiment with a Twitter data set containing tweets of more than 100 Dutch users including 37 who were hacked. The model detects 99% of the malicious tweets which proves that behavioural changes can reveal a hack and that anomaly-based features perform better than regular features. Our approach can be use used by social media systems such as Twitter to automatically detect a hack of an account only a short time after the fact allowing the legitimate owner of the account to be warned or protected preventing reputational damage and annoyance.

show abstract

6 million spam tweets: A large ground truth for timely Twitter spam detection

Cited by 106 publications

References 13 publications

A Survey of Spam Detection Methods on Twitter

A Survey of Spam Detection Methods on Twitter

Using supervised machine learning algorithms to detect suspicious URLs in online social networks

Detecting Hacked Twitter Accounts based on Behavioural Change

Contact Info

Product

Resources

About