An evaluation of Naive Bayes variants in content-based learning for spam filtering

Seewald, Alexander K.

doi:10.3233/ida-2007-11505

Cited by 34 publications

(18 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, in order to have an adequate assessment of the performance of filters, it is necessary to adopt more realistic evaluation settings (e.g., the TREC corpora, Cormack, 2006Cormack, , 2007Cormack & Lynam, 2005), that better mimic the scenario faced by a filter deployed for practical operation. In particular, the argument raised by Cormack and Lynam (2007), and further reinforced by Seewald (2007), regarding the still unproven potential of more advanced machine learning algorithms to Spam filtering, can be associated to the evaluation scenarios considered. More then simply affecting the experimental results obtained when reporting the development of a new filter, this may inspire the development of customized filters, tailored to the characteristics of the problem (cf.…”

Section: Discussionmentioning

confidence: 97%

“…In addition, the integrated approach achieved similar results to the SVM. Seewald (2007) evaluated the performance of a simple naive Bayes implementation (SpamBayes), along with CRM114 and SpamAssassin, which also employ more sophisticated language models and hard-coded rules, respectively. For the initial experiments, seven private mailboxes were used.…”

Section: Comparative Studiesmentioning

confidence: 99%

See 1 more Smart Citation

A review of machine learning approaches to Spam filtering

Guzella¹,

Caminhas²

2009

Expert Systems with Applications

439

197

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 97%

Section: Comparative Studiesmentioning

confidence: 99%

A review of machine learning approaches to Spam filtering

Guzella¹,

Caminhas²

2009

Expert Systems with Applications

439

197

View full text Add to dashboard Cite

“…They currently appear to be very popular in proprietary and open-source spam filters, including several free web-mail servers and open-source systems [25,35,45]. This is probably due to their simplicity, computational complexity and accuracy rate, which are comparable to more elaborate learning algorithms [35,38,46].…”

Section: Related Workmentioning

confidence: 98%

“…Further details about other techniques used for anti-spam filtering and applications that employ Bayesian classifiers are available in Bratko et al [9], Seewald [45], Koprinska et al [32], Cormack [14], Song et al [46], Marsono et al [35] and Guzella and Caminhas [25].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Spam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers

Almeida

Yamakami

2010

J Internet Serv Appl

View full text Add to dashboard Cite

E-mail spam has become an increasingly important problem with a big economic impact in society. Fortunately, there are different approaches allowing to automatically detect and remove most of those messages, and the best-known techniques are based on Bayesian decision theory. However, such probabilistic approaches often suffer from a well-known difficulty: the high dimensionality of the feature space. Many term-selection methods have been proposed for avoiding the curse of dimensionality. Nevertheless, it is still unclear how the performance of Naive Bayes spam filters depends on the scheme applied for reducing the dimensionality of the feature space. In this paper, we study the performance of many term-selection techniques with several different models of Naive Bayes spam filters. Our experiments were diligently designed to ensure statistically sound results. Moreover, we perform an analysis concerning the measurements usually employed to evaluate the quality of spam filters. Finally, we also investigate the benefits of using the Matthews correlation coefficient as a measure of performance.

show abstract