2015 IEEE International Conference on Communications (ICC) 2015
DOI: 10.1109/icc.2015.7249453
|View full text |Cite
|
Sign up to set email alerts
|

6 million spam tweets: A large ground truth for timely Twitter spam detection

Abstract: Twitter has changed the way of communication and getting news for people's daily life in recent years. Meanwhile, due to the popularity of Twitter, it also becomes a main target for spamming activities. In order to stop spammers, Twitter is using Google SafeBrowsing to detect and block spam links.Despite that blacklists can block malicious URLs embedded in tweets, their lagging time hinders the ability to protect users in real-time. Thus, researchers begin to apply different machine learning algorithms to dete… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
73
0
2

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 106 publications
(76 citation statements)
references
References 13 publications
(38 reference statements)
1
73
0
2
Order By: Relevance
“…Gao et al [82] propose a tweet-based spam detection approach based on the social degree of the tweet's sender, the history of interaction, the size of the cluster, the average time interval, the average number of URL in tweets, and the unique number of URL in tweets. Chen et al [83] present a real-time spam detection method for Twitter based on 12 lightweight features which are extracted from a dataset contains 6.5 million spam tweets. The features they consider detecting spam on Twitter are age of the account, the number of followers, the number of following, the number of likes the account received, the number of the account's lists, the number of tweets of the account, the number of retweets of the tweet, the number of hashtags used in the tweet, the number of mentioned users in the tweet, the number of URLs used in the tweet, the number of characters used in the tweet, and the number of digits used in the tweet.…”
Section: Hybrid Spam Detection Methodsmentioning
confidence: 99%
“…Gao et al [82] propose a tweet-based spam detection approach based on the social degree of the tweet's sender, the history of interaction, the size of the cluster, the average time interval, the average number of URL in tweets, and the unique number of URL in tweets. Chen et al [83] present a real-time spam detection method for Twitter based on 12 lightweight features which are extracted from a dataset contains 6.5 million spam tweets. The features they consider detecting spam on Twitter are age of the account, the number of followers, the number of following, the number of likes the account received, the number of the account's lists, the number of tweets of the account, the number of retweets of the tweet, the number of hashtags used in the tweet, the number of mentioned users in the tweet, the number of URLs used in the tweet, the number of characters used in the tweet, and the number of digits used in the tweet.…”
Section: Hybrid Spam Detection Methodsmentioning
confidence: 99%
“…These can then be deployed directly into a machine learning algorithm. Table I shows examples of common features that have been used in previous studies [21] [11].…”
Section: Related Workmentioning
confidence: 99%
“…They do this by conducting spam campaigns that make their "fake" accounts connect with other fake accounts, increasing the follower and following numbers [22]. The majority of previous studies [23][24] [21] begin by collecting data using the Twitter streaming API 3 . Multiple features are then extracted and different feature sets utilized.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Gawale and Patil implemented a system to detect malicious URLs on Twitter (Gawale and Patil, 2015). Chen et al (Chen et al, 2015) evaluated the ability of spam detection of various machine learning algorithms. They found that other classifiers tend to outperform Naive Bayes and SVM on spam detection.…”
Section: Related Workmentioning
confidence: 99%