Abstract:Recently, the development and implementation of phishing attacks require little technical skills and costs. This uprising has led to an ever-growing number of phishing attacks on the World Wide Web. Consequently, proactive techniques to fight phishing attacks have become extremely necessary. In this paper, we propose HTMLPhish, a deep learning based datadriven end-to-end automatic phishing web page classification approach. Specifically, HTMLPhish receives the content of the HTML document of a web page and empl… Show more
“…Although both models have the HTML feature extraction process, the presented model is not using any URL feature extraction with the use of expert knowledge, which is another benefit getting over the benchmarked model. The latest approach introduced to the phishing area is the HTMLPhish (Opara et al, 2019). It achieved the detection accuracy of 97.2%, and that accuracy is also low compared to proposed model accuracy.…”
Section: Resultsmentioning
confidence: 97%
“…As a solution for this manual feature extraction, deep learning techniques were tried out to implement automated feature extraction processes in the past. HTMLPhish (Opara et al, 2019) was such an attempt that used Recurrent Neural Network (RNN) to automated feature extraction process from HTML pages. It used only HTML pages in the detection process and achieved 97.2% detection accuracy.…”
Phishing, a social engineering crime which has been existing for more than two decades, has gained significant research attention to find better solutions to face against the very dynamic strategies of phishing. The financial sector is the primary target of phishing, and there are many different approaches to combat phishing attacks. Software-based detection approaches are more prominent in phishing detection; however, still, there is no robust solution that can stable for a long period. The primary purpose of this paper is to propose a novel solution to detect phishing attacks using a combined model of LSTM and CNN deep networks with the use of both URLs and HTML pages. The URLs are learned using an LSTM network with 1D convolutional, and another 1D convolutional network is used to learn the HTML features. These two networks were trained separately and combined through a sigmoid layer by dropping the last layer of each model to have the proposed model. The proposed model reached 98.34% in terms of accuracy, and that is above the previously recorded highest accuracy of 97.3% among the detection models used both URL and HTML features in the explored literature. The solution requires feature extraction only with HTML pages, and URLs were directly fed with a minimum pre-processing. Although the proposed solution uses extracted HTML features, those do not depend on third-party services. Therefore, an efficient real-time application can be implemented using the proposed model to detect phishing attacks to safeguard Internet users.
“…Although both models have the HTML feature extraction process, the presented model is not using any URL feature extraction with the use of expert knowledge, which is another benefit getting over the benchmarked model. The latest approach introduced to the phishing area is the HTMLPhish (Opara et al, 2019). It achieved the detection accuracy of 97.2%, and that accuracy is also low compared to proposed model accuracy.…”
Section: Resultsmentioning
confidence: 97%
“…As a solution for this manual feature extraction, deep learning techniques were tried out to implement automated feature extraction processes in the past. HTMLPhish (Opara et al, 2019) was such an attempt that used Recurrent Neural Network (RNN) to automated feature extraction process from HTML pages. It used only HTML pages in the detection process and achieved 97.2% detection accuracy.…”
Phishing, a social engineering crime which has been existing for more than two decades, has gained significant research attention to find better solutions to face against the very dynamic strategies of phishing. The financial sector is the primary target of phishing, and there are many different approaches to combat phishing attacks. Software-based detection approaches are more prominent in phishing detection; however, still, there is no robust solution that can stable for a long period. The primary purpose of this paper is to propose a novel solution to detect phishing attacks using a combined model of LSTM and CNN deep networks with the use of both URLs and HTML pages. The URLs are learned using an LSTM network with 1D convolutional, and another 1D convolutional network is used to learn the HTML features. These two networks were trained separately and combined through a sigmoid layer by dropping the last layer of each model to have the proposed model. The proposed model reached 98.34% in terms of accuracy, and that is above the previously recorded highest accuracy of 97.3% among the detection models used both URL and HTML features in the explored literature. The solution requires feature extraction only with HTML pages, and URLs were directly fed with a minimum pre-processing. Although the proposed solution uses extracted HTML features, those do not depend on third-party services. Therefore, an efficient real-time application can be implemented using the proposed model to detect phishing attacks to safeguard Internet users.
“…Opara et al [10] proposed the use of characters embedding and string embedding techniques to represent features of each HTML, then this representation is used as input to a Convolutional Neural Network (CNN) in order to model semantic dependencies. They collect their own data from Alexa and Phishtank, reporting two sets of data, the first one with 23000 legitimate websites and 2300 phishing websites used for training, and the second one with 24000 legitimate websites and 2400 phishing websites used for testing, these datasets are not available.…”
Section: Automatic Featuresmentioning
confidence: 99%
“…The URL and HTML strings are tokenized using a character corpus that includes punctuation marks, then, this tokenized data is processed into a character embedding matrix. They use the datasets presented in their previous work, [10], reporting an accuracy of 98.00% and an F1 score of 98.00%.…”
Phishing attacks have evolved and increased over time and, for this reason, the task of distinguishing between a legitimate site and a phishing site is more and more difficult, fooling even the most expert users. The main proposals focused on addressing this problem can be divided into four approaches: List-based, URL based, contentbased, and hybrid. In this state of the art, the most recent techniques using web content-based and hybrid approaches for Phishing Detection are reviewed and compared.
“…Many studies focus on the detection of desktop malicious webpages 7‐10 . These existing solutions can effectively detect the malicious web pages on the desktop devices.…”
In recent years, with the rapid development of mobile social networks and services, the research of mobile malicious webpage detection has become a hot topic. Most of the existing malicious webpage detection systems are deployed on desktop systems and servers. Due to the limitation of network transmission delay and computing resources, these existing solutions fail to provide the real‐time and lightweight properties for mobile webpage detection. In this paper, we propose an advanced mobile malicious webpage detection framework based on deep learning and edge cloud. Inspired by the idea of edge computing, a multidevice load optimization approach is first introduced to improve detection efficiency. Second, an automatic extraction approach based on deep learning model features is presented to enhance detection accuracy. Furthermore, detection systems can be flexibly deployed on edge nodes and servers, thus providing the properties of resource optimization deployment and real‐time detection. Finally, comparative analysis and performance evaluation are presented to show the detection efficiency and accuracy of the proposed framework.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.