HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis

Opara, C Chinenye; Wei, Bo; Chen, Yingke

doi:10.1109/ijcnn48605.2020.9207707

Cited by 47 publications

(23 citation statements)

References 19 publications

(23 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although both models have the HTML feature extraction process, the presented model is not using any URL feature extraction with the use of expert knowledge, which is another benefit getting over the benchmarked model. The latest approach introduced to the phishing area is the HTMLPhish (Opara et al, 2019). It achieved the detection accuracy of 97.2%, and that accuracy is also low compared to proposed model accuracy.…”

Section: Resultsmentioning

confidence: 97%

“…As a solution for this manual feature extraction, deep learning techniques were tried out to implement automated feature extraction processes in the past. HTMLPhish (Opara et al, 2019) was such an attempt that used Recurrent Neural Network (RNN) to automated feature extraction process from HTML pages. It used only HTML pages in the detection process and achieved 97.2% detection accuracy.…”

Section: Software-based Detectionmentioning

confidence: 99%

See 1 more Smart Citation

Detecting phishing attacks using a combined model of LSTM and CNN

Ariyadasa¹

2020

Int. j. adv. appl. sci.

View full text Add to dashboard Cite

Phishing, a social engineering crime which has been existing for more than two decades, has gained significant research attention to find better solutions to face against the very dynamic strategies of phishing. The financial sector is the primary target of phishing, and there are many different approaches to combat phishing attacks. Software-based detection approaches are more prominent in phishing detection; however, still, there is no robust solution that can stable for a long period. The primary purpose of this paper is to propose a novel solution to detect phishing attacks using a combined model of LSTM and CNN deep networks with the use of both URLs and HTML pages. The URLs are learned using an LSTM network with 1D convolutional, and another 1D convolutional network is used to learn the HTML features. These two networks were trained separately and combined through a sigmoid layer by dropping the last layer of each model to have the proposed model. The proposed model reached 98.34% in terms of accuracy, and that is above the previously recorded highest accuracy of 97.3% among the detection models used both URL and HTML features in the explored literature. The solution requires feature extraction only with HTML pages, and URLs were directly fed with a minimum pre-processing. Although the proposed solution uses extracted HTML features, those do not depend on third-party services. Therefore, an efficient real-time application can be implemented using the proposed model to detect phishing attacks to safeguard Internet users.

show abstract

Section: Resultsmentioning

confidence: 97%

Section: Software-based Detectionmentioning

confidence: 99%

Detecting phishing attacks using a combined model of LSTM and CNN

Ariyadasa¹

2020

Int. j. adv. appl. sci.

View full text Add to dashboard Cite

show abstract

“…Opara et al [10] proposed the use of characters embedding and string embedding techniques to represent features of each HTML, then this representation is used as input to a Convolutional Neural Network (CNN) in order to model semantic dependencies. They collect their own data from Alexa and Phishtank, reporting two sets of data, the first one with 23000 legitimate websites and 2300 phishing websites used for training, and the second one with 24000 legitimate websites and 2400 phishing websites used for testing, these datasets are not available.…”

Section: Automatic Featuresmentioning

confidence: 99%

“…The URL and HTML strings are tokenized using a character corpus that includes punctuation marks, then, this tokenized data is processed into a character embedding matrix. They use the datasets presented in their previous work, [10], reporting an accuracy of 98.00% and an F1 score of 98.00%.…”

Section: Automatic Featuresmentioning

confidence: 99%

State of the Art: Content-based and Hybrid Phishing Detection

Castaño¹,

Fidalgo²,

Alegre³

et al. 2021

Preprint

View full text Add to dashboard Cite

Phishing attacks have evolved and increased over time and, for this reason, the task of distinguishing between a legitimate site and a phishing site is more and more difficult, fooling even the most expert users. The main proposals focused on addressing this problem can be divided into four approaches: List-based, URL based, contentbased, and hybrid. In this state of the art, the most recent techniques using web content-based and hybrid approaches for Phishing Detection are reviewed and compared.

show abstract

“…Many studies focus on the detection of desktop malicious webpages 7‐10 . These existing solutions can effectively detect the malicious web pages on the desktop devices.…”

Section: Introductionmentioning

confidence: 99%

MMWD: An efficient mobile malicious webpage detection framework based on deep learning and edge cloud

Liu

Zhu

et al. 2021

Concurrency and Computation

View full text Add to dashboard Cite

In recent years, with the rapid development of mobile social networks and services, the research of mobile malicious webpage detection has become a hot topic. Most of the existing malicious webpage detection systems are deployed on desktop systems and servers. Due to the limitation of network transmission delay and computing resources, these existing solutions fail to provide the real‐time and lightweight properties for mobile webpage detection. In this paper, we propose an advanced mobile malicious webpage detection framework based on deep learning and edge cloud. Inspired by the idea of edge computing, a multidevice load optimization approach is first introduced to improve detection efficiency. Second, an automatic extraction approach based on deep learning model features is presented to enhance detection accuracy. Furthermore, detection systems can be flexibly deployed on edge nodes and servers, thus providing the properties of resource optimization deployment and real‐time detection. Finally, comparative analysis and performance evaluation are presented to show the detection efficiency and accuracy of the proposed framework.

show abstract

HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis

Cited by 47 publications

References 19 publications

Detecting phishing attacks using a combined model of LSTM and CNN

Detecting phishing attacks using a combined model of LSTM and CNN

State of the Art: Content-based and Hybrid Phishing Detection

MMWD: An efficient mobile malicious webpage detection framework based on deep learning and edge cloud

Contact Info

Product

Resources

About