2020
DOI: 10.1007/978-3-030-60796-8_26
|View full text |Cite
|
Sign up to set email alerts
|

Phishing Attacks and Websites Classification Using Machine Learning and Multiple Datasets (A Comparative Analysis)

Abstract: Phishing attacks are the most common type of cyber-attacks used to obtain sensitive information and have been affecting individuals as well as organizations across the globe. Various techniques have been proposed to identify the phishing attacks specifically, deployment of machine intelligence in recent years. However, the deployed algorithms and discriminating factors are very diverse in existing works. In this study, we present a comprehensive analysis of various machine learning algorithms to evaluate their… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 21 publications
(11 citation statements)
references
References 23 publications
(17 reference statements)
0
5
0
Order By: Relevance
“…The evaluation metrics were accuracy, precision, and recall. The results show that the proposed optimized stacking ensemble method outperformed the other recent and related works [7,10] in using the accuracy and recall performance measures for Dataset 1, and outperformed [35] in using the accuracy, precision and recall measures for for Dataset 2.…”
Section: Statistical Analysis and Comparison With Previous Studiesmentioning
confidence: 76%
“…The evaluation metrics were accuracy, precision, and recall. The results show that the proposed optimized stacking ensemble method outperformed the other recent and related works [7,10] in using the accuracy and recall performance measures for Dataset 1, and outperformed [35] in using the accuracy, precision and recall measures for for Dataset 2.…”
Section: Statistical Analysis and Comparison With Previous Studiesmentioning
confidence: 76%
“…Refs. [8,17] utilize the Tan's dataset [15] with similar proportion of training and testing at 70% and 30%.…”
Section: Discussionmentioning
confidence: 99%
“…This study utilized two public datasets; Tan's [15] for the first dataset and Hannousse and Yahiouche [16] for the second dataset. We use this dataset because some studies on similar topics use it, such as Chiew et al [8] on 2019, Khan et al [17] on 2020, Dangwal and Moldova [18] on 2021, Al-Sareem et al [19] on 2021, and Haynes et al [20] on 2021.The first dataset selected phishing websites with the PhishTank and OpenPhish URLs and legitimate websites with the Alexa and General Archives URLs. For building the dataset, Tan gathered webpages in two different sessions between January and May and the other session in May and June for two years.…”
Section: Datamentioning
confidence: 99%
“…Phishing, and its variations such as vishing and smishing, are fraud attacks in which the target is contacted by the conman via some means of communication, such as email, voice phone calls, or SMS. The attackers represent themselves as a reputable entity in order to gain trust or induce some sort of reaction in the target to make them give up sensitive information, such as passwords or banking information about themselves [45]. Phishing is a common way to gain access to a target system, and protecting against it can be so resource-heavy that affects day-to-day proceedings.…”
Section: Reconnaissancementioning
confidence: 99%
“…Attackers can benefit from defensive research as well, because ML for detection avoidance is one of the key features of a good and successful phishing campaign. Some efforts, such as that of Khan et al [45], experiment with a variety of algorithms and datasets, exposing strengths and weaknesses in them. Thus, this information can be used to further evolve and improve the phishing campaigns as well as the detection algorithms.…”
Section: Reconnaissancementioning
confidence: 99%