2020
DOI: 10.1155/2020/6662166
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Web Spam Based on Novel Features from Web Page Source Code

Abstract: Search engine is critical in people’s daily life because it determines the information quality people obtain through searching. Fierce competition for the ranking in search engines is not conducive to both users and search engines. Existing research mainly studies the content and links of websites. However, none of these techniques focused on semantic analysis of link and anchor text for detection. In this paper, we propose a web spam detection method by extracting novel feature sets from the homepage source c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 24 publications
0
5
0
Order By: Relevance
“…A program can modify the code segment and data segment of the process before and during running. Some malicious codes just use this point to encrypt key codes in files, and then decrypt the codes after running [39]. This executable file can encrypt key code with many keys, then many different malicious code files are generated, but in essence these malicious code files are all the same malicious code.…”
Section: Methodsmentioning
confidence: 99%
“…A program can modify the code segment and data segment of the process before and during running. Some malicious codes just use this point to encrypt key codes in files, and then decrypt the codes after running [39]. This executable file can encrypt key code with many keys, then many different malicious code files are generated, but in essence these malicious code files are all the same malicious code.…”
Section: Methodsmentioning
confidence: 99%
“…As a result, the weight of more important clauses in the text will increase. There are many calculation methods for the weights of keywords, such as Boolean weights, weights based on the concept of the heir, weights of TFIDF type [ 8 , 9 ], etc. The idea of the keyword extraction algorithm based on statistical features is to use the statistical information of the words in the document to extract the keywords of the document.…”
Section: Wpc Methods Based On DLmentioning
confidence: 99%
“…ere are many calculation methods for the weights of keywords, such as Boolean weights, weights based on the concept of the heir, weights of TFIDF type [8,9], etc.…”
Section: Wpc Methods Based On DLmentioning
confidence: 99%
“…Kumi et al [53] proposed a malicious URL detection method that uses a classification based -on -association (CBA) algorithm. They collected their dataset by crawling Alexa's top 500 sites [22], OpenPhish [36], VxVault [54], and URLhaus [55] and used 11 lexical and content-based features. Their model achieved an accuracy of 95.83%.…”
Section: ) Lexical and Content-based Features Studiesmentioning
confidence: 99%