Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets

Xiao, Xi; Xiao, Wentao; Zhang, Dianyan; Zhang, Bin; Hu, Guangwu; Li, Qing; Xia, Shu-Tao

doi:10.1016/j.cose.2021.102372

Cited by 47 publications

(15 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ultimately, the authors of [13] attempted to boost that detection accuracy rate through a blended approach of DNN and features weighting algorithms like genetic algorithm (GA) to classify phish websites by their most exploiting features. While researchers of [14], applied a multi-headed and self-attentional CNN on an imbalanced dataset throughout a generative adversarial network (GAN) with a large number of URL features. However, their work fell short of fixing the length of examined URL strings among other URL features.…”

Section: Literature Reviewmentioning

confidence: 99%

Deep learning in phishing mitigation: a uniform resource locator-based predictive model

Salah

Zuhair

2023

IJECE

View full text Add to dashboard Cite

<span lang="EN-US">To mitigate the evolution of phish websites, various phishing prediction8 schemes are being optimized eventually. However, the optimized methods produce gratuitous performance overhead due to the limited exploration of advanced phishing cues. Thus, a phishing uniform resource locator-based predictive model is enhanced by this work to defeat this deficiency using deep learning algorithms. This model’s architecture encompasses pre-processing of the effective feature space that is made up of 60 mutual uniform resource locator (URL) phishing features, and a dual deep learning-based model of convolution neural network with bi-directional long short-term memory (CNN-BiLSTM). The proposed predictive model is trained and tested on a dataset of 14,000 phish URLs and 28,074 legitimate URLs. Experimentally, the performance outputs are remarked with a 0.01% false positive rate (FPR) and 99.27% testing accuracy.</span>

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Deep learning in phishing mitigation: a uniform resource locator-based predictive model

Salah

Zuhair

2023

IJECE

View full text Add to dashboard Cite

show abstract

“…The results gave a 97 % accuracy for proposed model. Different from the other researchers, the authors in [19] produced a phishing URL to balance the dataset with the GAN. They created a dataset that contained 68,030 legitimate URLs and 12,003 phishing URLs from PhishTank.…”

Section: Related Workmentioning

confidence: 99%

A Hybrid Phishing Detection System Using Deep Learning-based URL and Content Analysis

Korkmaz¹,

Kocyigit²,

Şahingöz³

et al. 2022

ELEKTRON ELEKTROTECH

View full text Add to dashboard Cite

Phishing attacks are one of the most preferred types of attacks for cybercriminals, who can easily contact a large number of victims through the use of social networks, particularly through email messages. To protect end users, most of the security mechanisms control Uniform Resource Locator (URL) addresses because of their simplicity of implementation and execution speed. However, due to sophisticated attackers, this mechanism can miss some phishing attacks and has a relatively high false positive rate. In this research, a hybrid technique is proposed that uses not only URL features, but also content-based features as the second level of detection mechanism, thus improving the accuracy of the detection system while also minimizing the number of false positives. Additionally, most phishing detection algorithms use datasets that contain easily differentiated data pieces, either phishing or legitimate. However, in order to implement a more secure protection mechanism, we aimed to collect a larger and high-risk dataset. The proposed approaches were tested on this High-Risk URL and Content-Based Phishing Detection Dataset that only contains suspicious websites from PhishTank. According to experimental studies, an accuracy rate of 98.37 percent was achieved on a more realistic dataset for phishing detection.

show abstract

“…However, current learning-based methods tend to model the entire request message as streaming data [3][4][5][6][7][8][9][28][29][30][31][32][33][34], causing the individual presence of a sensitive path or malicious payload to be regarded as the prevailing decision-making factor. As existing methods neglect the implicit processing syntax and scenario-related characteristics, they cannot estimate the attack feasibility when conducting detection on captured malicious requests, which might incur further massive numbers of false alerts, especially during real-world deployment.…”

Section: Http Request Structurementioning

confidence: 99%

DualAC2NN: Revisiting and Alleviating Alert Fatigue from the Detection Perspective

2022

View full text Add to dashboard Cite

The exponential expansion of Internet interconnectivity has led to a dramatic increase in cyber-attack alerts, which contain a considerable proportion of false positives. The overwhelming number of false positives cause tremendous resource consumption and delay responses to the really severe incidents, namely, alert fatigue. To cope with the challenge from alert fatigue, we focus on enhancing the capability of detectors to reduce the generation of false alerts from the detection perspective. The core idea of our work is to train a machine-learning-based detector to grasp the empirical intelligence of security analysts to estimate the feasibility of an incoming HTTP request to cause substantial threats, and integrate the estimation into the detection stage to reduce false alarms. To this end, we innovatively introduce the concept of attack feasibility to characterize the composition rationality of an inbound HTTP request as a feasible attack under static scrutinization. First, we adopt a fast request-reorganization algorithm to transform an HTTP request into the form of interface:payload pair for further alignment of structural components which can reveal the processing logic of the target program. Then, we build a dual-channel attention-based circulant convolution neural network (DualAC2NN) to integrate the attack feasibility estimation into the alert decision, by comprehensively considering the interface sensitivity, payload maliciousness, and their bipartite compatibility. Experiments on a real-world dataset show that the proposed method significantly reduces invalid alerts by around 86.37% and over 61.64% compared to a rule-based commercial WAF and several state-of-the-art methods, along with retaining a detection rate at 97.89% and a lower time overhead, which indicates that our approach can effectively mitigate alert fatigue from the detection perspective.

show abstract

Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets

Cited by 47 publications

References 18 publications

Deep learning in phishing mitigation: a uniform resource locator-based predictive model

Deep learning in phishing mitigation: a uniform resource locator-based predictive model

A Hybrid Phishing Detection System Using Deep Learning-based URL and Content Analysis

DualAC2NN: Revisiting and Alleviating Alert Fatigue from the Detection Perspective

Contact Info

Product

Resources

About