Proceedings of the 22nd International Conference on World Wide Web 2013
DOI: 10.1145/2487788.2488140
|View full text |Cite
|
Sign up to set email alerts
|

Automatically generated spam detection based on sentence-level topic information

Abstract: Spammers use a wide range of content generation techniques with low quality pages known as content spam to achieve their goals. We argue that content spam must be tackled using a wide range of content quality features. In this paper, we propose novel sentence-level diversity features based on the probabilistic topic model. We combine them with other content features to build a content spam classifier. Our experiments show that our method outperforms the conventional methods.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 15 publications
(13 reference statements)
0
4
0
Order By: Relevance
“…survey research on the method of topic link detection based on improved information bottleneck theory [10], in this paper, a method of representing text is proposed, which can divide text into several sections of sub-topic features based on the regular pattern of semantic distribution and improve information bottleneck theory, then, the text represented by the attributes is utilized to do topic link detection, the experimental results have shown that this method has a fast convergent rate, and can improve the performance of topic link detection system. Suhara, Yoshihiko and others survey research on the method of information detection based on sentence-level topic [11], in this paper, the text sentence-level diversity features based on the probabilistic topic model is proposed, an information content classifier is also constructed combining features proposed, the experimental results show that this method outperforms the conventional methods. Pang, JB and others survey research on the method of unsupervised web topic detection using a ranked clustering-like pattern across similarity cascades [12], in this paper, a method using a clusteringlike pattern across similarity cascades is investigated from the perspective of similarity diffusion, a topic-restricted similarity diffusion process is also proposed to identify real topic from a large number of candidates efficiently, the experimental results demonstrate that this approach outperforms the state-of-the-art methods on several public data sets, those works are related to author's research direction of network topic detection and application.…”
Section: Related Workmentioning
confidence: 93%
“…survey research on the method of topic link detection based on improved information bottleneck theory [10], in this paper, a method of representing text is proposed, which can divide text into several sections of sub-topic features based on the regular pattern of semantic distribution and improve information bottleneck theory, then, the text represented by the attributes is utilized to do topic link detection, the experimental results have shown that this method has a fast convergent rate, and can improve the performance of topic link detection system. Suhara, Yoshihiko and others survey research on the method of information detection based on sentence-level topic [11], in this paper, the text sentence-level diversity features based on the probabilistic topic model is proposed, an information content classifier is also constructed combining features proposed, the experimental results show that this method outperforms the conventional methods. Pang, JB and others survey research on the method of unsupervised web topic detection using a ranked clustering-like pattern across similarity cascades [12], in this paper, a method using a clusteringlike pattern across similarity cascades is investigated from the perspective of similarity diffusion, a topic-restricted similarity diffusion process is also proposed to identify real topic from a large number of candidates efficiently, the experimental results demonstrate that this approach outperforms the state-of-the-art methods on several public data sets, those works are related to author's research direction of network topic detection and application.…”
Section: Related Workmentioning
confidence: 93%
“…The rational for using topic modeling is that spams have more unusual topic distributions than non-spam messages. For example, Suhara et al (2013) developed a sentence level LDA to assign topics to sentences for web spam detection. Biro et al, (2009) defined a threshold based on the outputs of LDA to distinguish between spam and non-spam.…”
Section: Related Workmentioning
confidence: 99%
“…In [53] the authors extracted features based on sentence-level topic information. They first created LDA [11] with a ham corpus and apply it to the unseen documents to infer the topic distribution of the sentences.…”
Section: Natural Language Processing Approachmentioning
confidence: 99%