2005
DOI: 10.1007/s10462-005-9006-6
|View full text |Cite
|
Sign up to set email alerts
|

An Assessment of Case-Based Reasoning for Spam Filtering

Abstract: Abstract. Because of the changing nature of spam, a spam filtering system that uses machine learning will need to be dynamic. This suggests that a case-based (memory-based) approach may work well. Case-Based Reasoning (CBR) is a lazy approach to machine learning where induction is delayed to run time. This means that the case base can be updated continuously and new training data is immediately available to the induction process. In this paper we present a detailed description of such a system called ECUE and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
43
0

Year Published

2005
2005
2017
2017

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 57 publications
(45 citation statements)
references
References 9 publications
(12 reference statements)
1
43
0
Order By: Relevance
“…Weighted nearest neighbor weights each of the k nearest examples by its similarity to x and compares the sum of the weights to a threshold. Several authors consider the use of clustering and kNN methods for spam filtering, but none report strong performance [6,47,143,144,197].…”
Section: Nearest Neighbor Methodsmentioning
confidence: 99%
“…Weighted nearest neighbor weights each of the k nearest examples by its similarity to x and compares the sum of the weights to a threshold. Several authors consider the use of clustering and kNN methods for spam filtering, but none report strong performance [6,47,143,144,197].…”
Section: Nearest Neighbor Methodsmentioning
confidence: 99%
“…We do not use numeric-valued features (e.g. occurrence frequencies) because we found that they resulted in only minor improvements in overall accuracy, no significant decrease in false positives, and much increased classification and case base editing times [7].…”
Section: The Feature-based Distance Measurementioning
confidence: 99%
“…We found it better, especially from the point of view of false positives, not to use feature weighting on the binary representation [7]. We compute features from some of the header fields and the body of the emails, with no stop-word removal or stemming.…”
Section: The Feature-based Distance Measurementioning
confidence: 99%
See 2 more Smart Citations