2016
DOI: 10.1371/journal.pone.0164383
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of Web Spam for Non-English Content: Toward More Effective Language-Based Classifiers

Abstract: Web spammers aim to obtain higher ranks for their web pages by including spam contents that deceive search engines in order to include their pages in search results even when they are not related to the search terms. Search engines continue to develop new web spam detection mechanisms, but spammers also aim to improve their tools to evade detection. In this study, we first explore the effect of the page language on spam detection features and we demonstrate how the best set of detection features varies accordi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 31 publications
0
5
0
Order By: Relevance
“…However, as the number of new spam types increases every year, this field should be up to date to eliminate the negative impacts of these new types of spam. By considering the features of a web page language, spam detection tools would be highly effective at capturing spam content [25].…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…However, as the number of new spam types increases every year, this field should be up to date to eliminate the negative impacts of these new types of spam. By considering the features of a web page language, spam detection tools would be highly effective at capturing spam content [25].…”
Section: Discussionmentioning
confidence: 99%
“…Web spamming can be very dangerous because it spreads malware, which can affect users' privacy by obtaining sensitive information. Web spam detection using ANLP tools are used to determine information and features included in the content of web pages to ensure that only web pages that present useful content are retrieved [18], [25]- [28]. This allows threats posed by suspicious web pages to be mitigated [3].…”
Section: Web Spam Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…When URLs are exploited for purposes other than accessing legitimate resources on the Internet, they pose a threat to data integrity, confidentiality, and availability. The different kinds of malicious URLs are discussed below [9].…”
Section: B Url Attack Techniquesmentioning
confidence: 99%
“…Prior work addressed web security vulnerabilities by either researching the ways to improve dynamic testing techniques [41], proposing or evaluating static analysis testing mechanisms [42][43][44][45], or using these testing approaches to evaluate the security of web-based systems [46][47][48][49]. The majority of research studies that evaluated the performance of web vulnerability scanners suggested that they have limited crawling capabilities and high false positive rates.…”
Section: Related Workmentioning
confidence: 99%