Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007) 2007
DOI: 10.1109/fskd.2007.207
|View full text |Cite
|
Sign up to set email alerts
|

Boosting the Performance of Web Spam Detection with Ensemble Under-Sampling Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0
3

Year Published

2010
2010
2016
2016

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 41 publications
(21 citation statements)
references
References 8 publications
0
18
0
3
Order By: Relevance
“…Considering the web spam domain from a machine learning perspective, we analysed the behaviour of two wellknown state-of-the-art algorithms for web spam classification (i.e., SVM and C5.0) comparing their performance first as individual classifiers and then as hybridized classifiers (using regular expressions) in our WSF2 framework. These classifiers were selected because of their effectiveness and relative efficiency as evidenced in previous research works [17,23,38,39]. Although regular expressions used as an individual technique achieve poor results in spam filtering, their proper combination with other machine learning approaches improves the accuracy of definitive antispam classification.…”
Section: Case Studymentioning
confidence: 99%
See 1 more Smart Citation
“…Considering the web spam domain from a machine learning perspective, we analysed the behaviour of two wellknown state-of-the-art algorithms for web spam classification (i.e., SVM and C5.0) comparing their performance first as individual classifiers and then as hybridized classifiers (using regular expressions) in our WSF2 framework. These classifiers were selected because of their effectiveness and relative efficiency as evidenced in previous research works [17,23,38,39]. Although regular expressions used as an individual technique achieve poor results in spam filtering, their proper combination with other machine learning approaches improves the accuracy of definitive antispam classification.…”
Section: Case Studymentioning
confidence: 99%
“…In fact, Geng and colleagues [23] introduced the first proposal using both content-and link-based features to detect web spam pages. In the same line, the work of Becchetti and colleagues [24] combined link-and contentbased features using C4.5 to detect web spam.…”
Section: Related Work On Web Spam Filteringmentioning
confidence: 99%
“…can be used as classification algorithms on imbalance datasets. Geng et al proposed a novel ensemble classifier based on under-sampling technique and C4.5 decision tree classifier for web spam detection and achieved good results [12].…”
Section: B Solving Class Imbalance Problem On Web Spam Datasetmentioning
confidence: 99%
“…examples in N-N', is neglected, and the neglect leads to the main deficiency of under-sampling algorithm. An ensemble strategy can be used to overcome the deficiency and keep the efficiency of under-sampling [12]- [14]. Different from [12]- [14], we set the number of N' approximately equal to the number of S, and divide all of the N samples into several N' samples subset randomly.…”
Section: A Ensemble Based On Under-samplingmentioning
confidence: 99%
See 1 more Smart Citation