Detecting Web Spam Based on Novel Features from Web Page Source Code

Liu, Jiayong; Su, Yu; Lv, Shun; Huang, Cheng

doi:10.1155/2020/6662166

Cited by 9 publications

(6 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A program can modify the code segment and data segment of the process before and during running. Some malicious codes just use this point to encrypt key codes in files, and then decrypt the codes after running [39]. This executable file can encrypt key code with many keys, then many different malicious code files are generated, but in essence these malicious code files are all the same malicious code.…”

Section: Methodsmentioning

confidence: 99%

A study on detection and defence of malicious code under network security over biomedical devices

Liu

Neware

Bhatt

et al. 2022

The Journal of Engineering

View full text Add to dashboard Cite

With the proliferation of massive varieties and unknown malicious viruses, it is difficult to achieve effective and timely defence by anti‐virus technology with virus signature matching as the core. Hence, the research of malicious code detection and defence is investigated. Firstly, it introduces the defence technology and detection technology. The simulation results show that the default infection probability of the simulation program is 15.1%. The default detection probability is 60.1%, the default number of nodes is 1000, the default time step is 1000 ms, and the default initial infected node is only node 0, that is, only the malicious code on node 0 is activated. After start‐up, the whole P2P starts to run, and the number of infected nodes increases. Finally, all nodes are infected. The infection speed becomes faster and faster. When the number of infected nodes reaches about half, the infection speed begins to slow down.

show abstract

Section: Methodsmentioning

confidence: 99%

A study on detection and defence of malicious code under network security over biomedical devices

Liu

Neware

Bhatt

et al. 2022

The Journal of Engineering

View full text Add to dashboard Cite

show abstract

“…As a result, the weight of more important clauses in the text will increase. There are many calculation methods for the weights of keywords, such as Boolean weights, weights based on the concept of the heir, weights of TFIDF type [ 8 , 9 ], etc. The idea of the keyword extraction algorithm based on statistical features is to use the statistical information of the words in the document to extract the keywords of the document.…”

Section: Wpc Methods Based On DLmentioning

confidence: 99%

“…ere are many calculation methods for the weights of keywords, such as Boolean weights, weights based on the concept of the heir, weights of TFIDF type [8,9], etc.…”

Section: Wpc Methods Based On DLmentioning

confidence: 99%

Web Page Classification Algorithm Based on Deep Learning

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Transmit and process information to establish a learning mechanism and realize the processing of image data and sound data. However, the current research on Web page classification algorithm (WPCA) based on deep learning (DL) is not in-depth. Therefore, the main research of this article is the research of WPCA based on DL. This article first uses the keyword weight calculation method to reduce the impact of a small number of high-frequency words in the web page document on the weight calculation and reduces the value of the low-frequency word weights so that the WPCA is more accurate in the calculation process; second, the use of Chinese web pages: the classification method calculates the similarity between the text to be classified and all the class templates and then determines the category of all texts according to the similarity and certain classification rules; finally, in order to improve the learning rate of DL, consider using adaptive parameters. The optimization algorithm automatically adjusts the size of the learning rate, making the research of WPCA based on DL more efficient. After comparing the DL-based WPCA with the traditional algorithm, the data shows that in terms of time expenditure, the DL WPCA is 354 s, the traditional algorithm is 2436 s; in terms of memory overhead, the DL WPCA is 6.35 s, the traditional algorithm is 186.25 s. The experimental results show that WPCA based on DL are faster and more efficient than traditional algorithms and consume less system memory.

show abstract

“…Kumi et al [53] proposed a malicious URL detection method that uses a classification based -on -association (CBA) algorithm. They collected their dataset by crawling Alexa's top 500 sites [22], OpenPhish [36], VxVault [54], and URLhaus [55] and used 11 lexical and content-based features. Their model achieved an accuracy of 95.83%.…”

Section: ) Lexical and Content-based Features Studiesmentioning

confidence: 99%

Detecting Malicious URLs Using Machine Learning Techniques: Review and Research Directions

et al. 2022

View full text Add to dashboard Cite

In recent years, the digital world has advanced significantly, particularly on the Internet, which is critical given that many of our activities are now conducted online. As a result of attackers' inventive techniques, the risk of a cyberattack is rising rapidly. One of the most critical attacks is the malicious URL intended to extract unsolicited information by mainly tricking inexperienced end users, resulting in compromising the user's system and causing losses of billions of dollars each year. As a result, securing websites is becoming more critical. In this paper, we provide an extensive literature review highlighting the main techniques used to detect malicious URLs that are based on machine learning models, taking into consideration the limitations in the literature, detection technologies, feature types, and the datasets used. Moreover, due to the lack of studies related to malicious Arabic website detection, we highlight the directions of studies in this context. Finally, as a result of the analysis that we conducted on the selected studies, we present challenges that might degrade the quality of malicious URL detectors, along with possible solutions.

show abstract

Detecting Web Spam Based on Novel Features from Web Page Source Code

Cited by 9 publications

References 24 publications

A study on detection and defence of malicious code under network security over biomedical devices

A study on detection and defence of malicious code under network security over biomedical devices

Web Page Classification Algorithm Based on Deep Learning

Detecting Malicious URLs Using Machine Learning Techniques: Review and Research Directions

Contact Info

Product

Resources

About