2021
DOI: 10.1088/1742-6596/1738/1/012131
|View full text |Cite
|
Sign up to set email alerts
|

Research on phishing webpage detection technology based on CNN-BiLSTM algorithm

Abstract: The rapid development of the Internet has also brought opportunities for some illegal elements. Network attackers steal sensitive information from victims through phishing webpages to obtain economic benefits. Currently, the commonly used detection methods for phishing webpages, based on blacklist detection and webpage content feature detection, have the problems of being unable to detect newly emerging phishing webpages or requiring manual extraction of webpage features. Therefore, researchers have used Convo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
2
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 1 publication
0
7
0
Order By: Relevance
“…In hybrid models combining two different feature sets, a CNN-based model can be used instead of the RNN-based model used for character embedding features. However, a CNN-based model has high memory requirements and could not expose long-distance dependent features [ 74 ].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In hybrid models combining two different feature sets, a CNN-based model can be used instead of the RNN-based model used for character embedding features. However, a CNN-based model has high memory requirements and could not expose long-distance dependent features [ 74 ].…”
Section: Methodsmentioning
confidence: 99%
“…Using character embedding with CNN-based models had the following limitations: (1) CNN-based models had high memory needs (2) CNN-based models could not find long-distance dependent features. Novel hybrid architecture that uses RNN-based models instead of CNN-based models can cope with this challenge [ 74 ].…”
Section: Introductionmentioning
confidence: 99%
“…Similarly, the authors of [24,28,29,32] described the optimization process, but only on certain parameters, for example, the number of convolutional layers, number of kernels, and kernel size. Additionally, in terms of performance metrics, it was observed that accuracy, precision, recall, and F1-score were the most common measures [7,24,28,[30][31][32]34,35,37,38]. Other evaluation metrics were training time, detection time, GPU memory requirement, etc.…”
Section: Convolutional Neural Network (Cnn)mentioning
confidence: 99%
“…The third way that has proven to be the most effective is to use machine learning and deep learning techniques that learns about the characteristic features of previous malicious links and can make accurate distinctions in the future based on previous predictions made [2]. Current mainstream machine learning methods of phishing website detection extract statistical features from the URL or extract relevant features of the webpage, such as the layout, Domain information or HTML& JavaScript and then classify these features but machine learning algorithms do not analyze the sequence or the positions of words in a URL and also 63% of phishing websites have a lifespan of only 2 hours after which they change either expire or change their domain name [3]. In order to use the machine learning techniques that focuses on the statistical features of URL and also to exploit the orientation and sequence learning capability of deep learning, we propose a CNN-LSTM model along with Random Forest, they belong to the eld of deep learning whereas Random Forest classi er belongs to the eld of machine learning.…”
Section: Introductionmentioning
confidence: 99%
“…Their method rst performed word segmentation processing on URL based on sensitive word segmentation, then converted it into a feature vector matrix that automatically extracts its local features through CNN and acquired its bidirectional long-distance dependent features through BiLSTM. Their model classi ed the phishing and legitimate URLs with accuracy of 98.84% [3].…”
Section: Introductionmentioning
confidence: 99%