Spam filtering for short messages

Cormack, Gordon V.; Hidalgo, José María Gómez; Puertas, Enrique

doi:10.1145/1321440.1321486

Cited by 102 publications

(54 citation statements)

References 11 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…al. in [9] conducted a broader analysis on filtering of short messages. They evaluated different content based filtering systems implementing algorithms like Naïve Bayes, Support Vector Machines, Dynamic Markov Compression and Logistic Regression using bag-of-words, orthogonal sparse bigram features and compression model based approach on short text message, blog-spams and email summary information.…”

Section: Related Workmentioning

confidence: 99%

Characterizing comment spam in the blogosphere through content analysis

Bhattarai

Rus

Dasgupta

2009

2009 IEEE Symposium on Computational Intelligence in Cyber Security

View full text Add to dashboard Cite

Abstract-Spams are no longer limited to emails and webpages. The increasing penetration of spam in the form of comments in blogs and social networks has started becoming a nuisance and potential threat. In this work, we explore the challenges posed by this type of spam in the blogosphere with substantial generalization regarding other social media. Thus, we investigate the characteristics of comment spam in blogs based on their content. The framework uses some of the previously explored methods developed to effectively extract the features of the blog spam and also introduces a novel method of active learning from the raw data without requiring training instances. This makes the approach more flexible and realistic for such applications. We also incorporate the concept of cotraining for supervised learning to get accurate results. The preliminary evaluation of the proposed framework shows promising results.

show abstract

Section: Related Workmentioning

confidence: 99%

Characterizing comment spam in the blogosphere through content analysis

Bhattarai

Rus

Dasgupta

2009

2009 IEEE Symposium on Computational Intelligence in Cyber Security

View full text Add to dashboard Cite

show abstract

“…lowercase) words, character bi-and tri-grams and word bi-grams suggested by Gómez Hidalgo et al (2006) has provided a base feature set for much of the work in feature engineering. Cormack et al (2007) found that a slight variation on this set including orthogonal word bigrams improved the performance of classification algorithms on SMS spam data. Sohn et al (2009) expanded the base feature set by including features based on stylometry 7 suggested in author attribution studies.…”

Section: Feature Engineering In Sms Spammentioning

confidence: 98%

SMS spam filtering: Methods and data

Delany¹,

Buckley²,

Greene

2012

Expert Systems with Applications

168

View full text Add to dashboard Cite

Mobile or SMS spam is a real and growing problem primarily due to the availability of very cheap bulk pre-pay SMS packages and the fact that SMS engenders higher response rates as it is a trusted and personal service. SMS spam filtering is a relatively new task which inherits many issues and solutions from email spam filtering. However it poses its own specific challenges. This paper motivates work on filtering SMS spam and reviews recent developments in SMS spam filtering. The paper also discusses the issues with data collection and availability for furthering research in this area, analyses a large corpus of SMS spam, and provides some initial benchmark results.

show abstract

“…There has been numerous numbers of studies on active learning for text classification using machine learning techniques [9]- [11], probabilistic models [12], [13]. The query by committee algorithm (Seung et al 1992, Freund et al, 1997) used priori distribution than hypothesis.…”

Section: Background Study and Related Workmentioning

confidence: 99%

SMS Classification Based on Naïve Bayes Classifier and Apriori Algorithm Frequent Itemset

Ahmed¹,

Guan²,

Chung³

2014

IJMLC

View full text Add to dashboard Cite

Abstract-In this paper, we propose a hybrid system of SMS classification to detect spam or ham, using Naï ve Bayes classifier and Apriori algorithm. Though this technique is fully logic based, its performance will rely on statistical character of the database. Naï ve Bayes is considered as one of the most effectual and significant learning algorithms for machine learning and data mining and also has been treated as a core technique in information retrieval. However, by applying user-specified minimum support and minimum confidence, we gain significant improvement on effective accuracy 98.7% from the traditional Naï ve Bayes approach 97.4% experimenting on UCI Data Repository.Index Terms-Short message service (SMS), Naï ve Bayes classifier, Apriori algorithm, spam, ham, minimum support, minimum confidence.

show abstract

Spam filtering for short messages

Cited by 102 publications

References 11 publications

Characterizing comment spam in the blogosphere through content analysis

Characterizing comment spam in the blogosphere through content analysis

SMS spam filtering: Methods and data

SMS Classification Based on Naïve Bayes Classifier and Apriori Algorithm Frequent Itemset

Contact Info

Product

Resources

About