Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management 2007
DOI: 10.1145/1321440.1321486
|View full text |Cite
|
Sign up to set email alerts
|

Spam filtering for short messages

Abstract: We consider the problem of content-based spam filtering for short text messages that arise in three contexts: mobile (SMS) communication, blog comments, and email summary information such as might be displayed by a lowbandwidth client. Short messages often consist of only a few words, and therefore present a challenge to traditional bag-of-words based spam filters. Using three corpora of short messages and message fields derived from real SMS, blog, and spam messages, we evaluate feature-based and compression-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
54
0

Year Published

2008
2008
2018
2018

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 102 publications
(54 citation statements)
references
References 11 publications
(12 reference statements)
0
54
0
Order By: Relevance
“…al. in [9] conducted a broader analysis on filtering of short messages. They evaluated different content based filtering systems implementing algorithms like Naïve Bayes, Support Vector Machines, Dynamic Markov Compression and Logistic Regression using bag-of-words, orthogonal sparse bigram features and compression model based approach on short text message, blog-spams and email summary information.…”
Section: Related Workmentioning
confidence: 99%
“…al. in [9] conducted a broader analysis on filtering of short messages. They evaluated different content based filtering systems implementing algorithms like Naïve Bayes, Support Vector Machines, Dynamic Markov Compression and Logistic Regression using bag-of-words, orthogonal sparse bigram features and compression model based approach on short text message, blog-spams and email summary information.…”
Section: Related Workmentioning
confidence: 99%
“…lowercase) words, character bi-and tri-grams and word bi-grams suggested by Gómez Hidalgo et al (2006) has provided a base feature set for much of the work in feature engineering. Cormack et al (2007) found that a slight variation on this set including orthogonal word bigrams improved the performance of classification algorithms on SMS spam data. Sohn et al (2009) expanded the base feature set by including features based on stylometry 7 suggested in author attribution studies.…”
Section: Feature Engineering In Sms Spammentioning
confidence: 98%
“…There has been numerous numbers of studies on active learning for text classification using machine learning techniques [9]- [11], probabilistic models [12], [13]. The query by committee algorithm (Seung et al 1992, Freund et al, 1997) used priori distribution than hypothesis.…”
Section: Background Study and Related Workmentioning
confidence: 99%