Research on phishing webpage detection technology based on CNN-BiLSTM algorithm

Zhang, Qiao; Bu, Youjun; Chen, Bo; Zhang, Surong; Lǚ, Xiangyu

doi:10.1088/1742-6596/1738/1/012131

Cited by 10 publications

(7 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In hybrid models combining two different feature sets, a CNN-based model can be used instead of the RNN-based model used for character embedding features. However, a CNN-based model has high memory requirements and could not expose long-distance dependent features [ 74 ].…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

A hybrid DNN–LSTM model for detecting phishing URLs

Özcan

Çatal

Dönmez

et al. 2021

Neural Comput & Applic

View full text Add to dashboard Cite

Phishing is an attack targeting to imitate the official websites of corporations such as banks, e-commerce, financial institutions, and governmental institutions. Phishing websites aim to access and retrieve users’ important information such as personal identification, social security number, password, e-mail, credit card, and other account information. Several anti-phishing techniques have been developed to cope with the increasing number of phishing attacks so far. Machine learning and particularly, deep learning algorithms are nowadays the most crucial techniques used to detect and prevent phishing attacks because of their strong learning abilities on massive datasets and their state-of-the-art results in many classification problems. Previously, two types of feature extraction techniques [i.e., character embedding-based and manual natural language processing (NLP) feature extraction] were used in isolation. However, researchers did not consolidate these features and therefore, the performance was not remarkable. Unlike previous works, our study presented an approach that utilizes both feature extraction techniques. We discussed how to combine these feature extraction techniques to fully utilize from the available data. This paper proposes hybrid deep learning models based on long short-term memory and deep neural network algorithms for detecting phishing uniform resource locator and evaluates the performance of the models on phishing datasets. The proposed hybrid deep learning models utilize both character embedding and NLP features, thereby simultaneously exploiting deep connections between characters and revealing NLP-based high-level connections. Experimental results showed that the proposed models achieve superior performance than the other phishing detection models in terms of accuracy metric.

show abstract

Section: Methodsmentioning

confidence: 99%

“…Using character embedding with CNN-based models had the following limitations: (1) CNN-based models had high memory needs (2) CNN-based models could not find long-distance dependent features. Novel hybrid architecture that uses RNN-based models instead of CNN-based models can cope with this challenge [ 74 ].…”

Section: Introductionmentioning

confidence: 99%

A hybrid DNN–LSTM model for detecting phishing URLs

Özcan

Çatal

Dönmez

et al. 2021

Neural Comput & Applic

View full text Add to dashboard Cite

show abstract

“…Similarly, the authors of [24,28,29,32] described the optimization process, but only on certain parameters, for example, the number of convolutional layers, number of kernels, and kernel size. Additionally, in terms of performance metrics, it was observed that accuracy, precision, recall, and F1-score were the most common measures [7,24,28,[30][31][32]34,35,37,38]. Other evaluation metrics were training time, detection time, GPU memory requirement, etc.…”

Section: Convolutional Neural Network (Cnn)mentioning

confidence: 99%

Phishing Webpage Classification via Deep Learning-Based Algorithms: An Empirical Study

Selamat

Krejcar

et al. 2021

Applied Sciences

View full text Add to dashboard Cite

Phishing detection with high-performance accuracy and low computational complexity has always been a topic of great interest. New technologies have been developed to improve the phishing detection rate and reduce computational constraints in recent years. However, one solution is insufficient to address all problems caused by attackers in cyberspace. Therefore, the primary objective of this paper is to analyze the performance of various deep learning algorithms in detecting phishing activities. This analysis will help organizations or individuals select and adopt the proper solution according to their technological needs and specific applications’ requirements to fight against phishing attacks. In this regard, an empirical study was conducted using four different deep learning algorithms, including deep neural network (DNN), convolutional neural network (CNN), Long Short-Term Memory (LSTM), and gated recurrent unit (GRU). To analyze the behaviors of these deep learning architectures, extensive experiments were carried out to examine the impact of parameter tuning on the performance accuracy of the deep learning models. In addition, various performance metrics were measured to evaluate the effectiveness and feasibility of DL models in detecting phishing activities. The results obtained from the experiments showed that no single DL algorithm achieved the best measures across all performance metrics. The empirical findings from this paper also manifest several issues and suggest future research directions related to deep learning in the phishing detection domain.

show abstract

“…The third way that has proven to be the most effective is to use machine learning and deep learning techniques that learns about the characteristic features of previous malicious links and can make accurate distinctions in the future based on previous predictions made [2]. Current mainstream machine learning methods of phishing website detection extract statistical features from the URL or extract relevant features of the webpage, such as the layout, Domain information or HTML& JavaScript and then classify these features but machine learning algorithms do not analyze the sequence or the positions of words in a URL and also 63% of phishing websites have a lifespan of only 2 hours after which they change either expire or change their domain name [3]. In order to use the machine learning techniques that focuses on the statistical features of URL and also to exploit the orientation and sequence learning capability of deep learning, we propose a CNN-LSTM model along with Random Forest, they belong to the eld of deep learning whereas Random Forest classi er belongs to the eld of machine learning.…”

Section: Introductionmentioning

confidence: 99%

“…Their method rst performed word segmentation processing on URL based on sensitive word segmentation, then converted it into a feature vector matrix that automatically extracts its local features through CNN and acquired its bidirectional long-distance dependent features through BiLSTM. Their model classi ed the phishing and legitimate URLs with accuracy of 98.84% [3].…”

Section: Introductionmentioning

confidence: 99%

Phishing URL Detection Using CNN-LSTM and Random Forest Classifier

Nepal

Gurung

nepal

2022

Preprint

View full text Add to dashboard Cite

This paper presents the classification of phishing URL's apart from legitimate URL's with the use of machine learning and deep learning techniques. Phishing is defined as an act to steal the private information by pretending to be a legitimate entity which they are not. Machine learning model, Random Forest classifier is trained on the extracted features based on Address Bar, Domain and HTML and JavaScript of the URL. On the other hand, CNN-LSTM hybrid model was trained to learn the character sequence features of the given URL and make the classification. The dataset used was public data from Kaggle which was downloaded from their website. The dataset contained 11,430 URLs: 5,715 legitimate URLs and 5,715 phishing URL. Hereafter, we classified the URL of the current address bar as legitimate or phishing with the use of previously trained model. Thus, proposed paper focuses on the study and development of models for detection of phishing sites so that properties of various URLs can be learnt by feature extraction and can be classified as accurately as possible.

show abstract

Research on phishing webpage detection technology based on CNN-BiLSTM algorithm

Cited by 10 publications

References 1 publication

A hybrid DNN–LSTM model for detecting phishing URLs

A hybrid DNN–LSTM model for detecting phishing URLs

Phishing Webpage Classification via Deep Learning-Based Algorithms: An Empirical Study

Phishing URL Detection Using CNN-LSTM and Random Forest Classifier

Contact Info

Product

Resources

About