2012
DOI: 10.5120/7924-0993
|View full text |Cite
|
Sign up to set email alerts
|

Web Spam Detection by Learning from Small Labeled Samples

Abstract: Web spamming tries to deceive search engines to rank some pages higher than they deserve. Many methods have been proposed to combat web spamming and to detect spam pages. One basic method is using classification, i.e., learning a classification model from previously labeled training data and using this model for classifying web pages to spam or nonspam. A drawback of this method is that manually labeling a large number of web pages to generate the training data can be biased, non-accurate, labor intensive and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(2 citation statements)
references
References 14 publications
0
2
0
Order By: Relevance
“…To automatically determine the class labels (trust/distrust) of unlabelled response tweets, we adopted an expectation-maximization (EM) based semisupervised classifier (Nigam et al, 2006 ; Karimpour et al, 2012 ). EM is an iterative algorithm to maximize a posteriori estimation in datasets with both labeled and unlabeled data (Nigam et al, 2000 ).…”
Section: Methodsmentioning
confidence: 99%
“…To automatically determine the class labels (trust/distrust) of unlabelled response tweets, we adopted an expectation-maximization (EM) based semisupervised classifier (Nigam et al, 2006 ; Karimpour et al, 2012 ). EM is an iterative algorithm to maximize a posteriori estimation in datasets with both labeled and unlabeled data (Nigam et al, 2000 ).…”
Section: Methodsmentioning
confidence: 99%
“…Out of 627 product reviews, 10 r eviews are found to be abusive, a nd hence, removed, and 48 are fo und to be spam PILAKA ANUSHA [9] Hadoop 1 represents the description of the classifiers, datasets and results used from the research work. Based on research work most of them used the SVM and naive bayes classifier, for different types of datasets.…”
Section: Hotel Datasetmentioning
confidence: 99%