PhishBlock: A hybrid anti-phishing tool

Fahmy, Hossam Mahmoud Ahmad; Ghoneim, Salma A.

doi:10.1109/ccca.2011.6031523

Cited by 10 publications

(6 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Data collection: we have collected a large dataset of 10,000 benign and malicious URLs from various sources, such as Phishtank [14], Kdnuggets [15], and [16]. According to the related studies [16][17][18][19][20][21], conventional learning methods are intended originally for balanced data sets. They intend to optimize their objective functions that usually guide to the highest overall accuracy (the degree of the number of true predictions out of all predictions addressed).…”

Section: Methodsmentioning

confidence: 99%

“…They intend to optimize their objective functions that usually guide to the highest overall accuracy (the degree of the number of true predictions out of all predictions addressed). Many studies [16][17][18][19][20][21] have shown that a balanced data set provides improved overall classification performance for several base classifiers compared to an imbalanced data set. Therefore, for this study, the dataset is divided equally into 5000 benign and 5000 malicious URLs.…”

Section: Methodsmentioning

confidence: 99%

“…• Extract Link features: There are dozens of URL properties employed to distinguish both benign and malicious URLs. Depending on previous studies in this field [16][17][18][19][20][21], we have chosen the most influential URL properties in predicting URLs classes. They become a result of applying several feature selection techniques.…”

Section: Methodsmentioning

confidence: 99%

“…They become a result of applying several feature selection techniques. We developed a tool to extract 28 link-based attributes of those URLs according to the recommended detection features [16][17][18][19][20][21].…”

Section: Methodsmentioning

confidence: 99%

See 3 more Smart Citations

URL Links Malicious Classification Towards Autonomous Threat Detection Systems

Al-Smadi

Alsmadi

Wahsheh

2021

Proceedings of International Conference on Emerging Technologies and Intelligent Systems

View full text Add to dashboard Cite

Cyber threat behaviors can take different forms, approaches, and goals. For threat detection systems, it is essential to monitor URLs known for previous malicious attempts. It is also vital to study attack behaviors for the ultimate goal of designing autonomous threat detection systems. We collected a large dataset of URL links annotated toward that goal, manually as benign or malicious links. Several features are collected about those links related to lexicons and structure. Two classification algorithms were employed to extract the best features to predict whether a link is malicious or benign. We were applied three preprocessing techniques, including handling missing values, dealing with outliers, and categorical variables. For two evaluated classifiers, the results' bias is avoided using proper data splitting methods. The quality of the classifiers is evaluated using classifiers' accuracy. The results indicated that Random Forest continuously reported better accuracy than the Decision Tree classifier, which could predict URL link type, whether malicious or not, with the highest accuracy with 98%.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 2 more Smart Citations

URL Links Malicious Classification Towards Autonomous Threat Detection Systems

Al-Smadi

Alsmadi

Wahsheh

2021

Proceedings of International Conference on Emerging Technologies and Intelligent Systems

View full text Add to dashboard Cite

show abstract

“…In this method, a list of previously known URLs that have been confirmed is stored and maintained in a database. The database often becomes compiled by several toolbars such as PhishBook [8], and PhishTank [15]. The method is very fast since it is only querying against a database, however, because the new technology has made the attackers capable of only hosting malicious domains for only a couple of hours, this method is no longer as effective [19].…”

Section: Related Workmentioning

confidence: 99%

Towards the Detection of Malicious URL and Domain Names Using Machine Learning

Ghalati

Ghalaty

Barata

2020

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

Malicious Uniform Resource Locator (URL) is an important problem in web search and mining. Malicious URLs host unsolicited content (spam, phishing, drive-by downloads, etc.) and try to lure uneducated users into clicking in such links or downloading malware which will result in critical data exfiltration. Traditional techniques in detecting such URLs have been to use blacklists and rule-based methods. The main disadvantage of such problems is that they are not resistant to 0-day attacks, meaning that there will be at least one victim for each URL before the blacklist is created. Other techniques include having sandbox and testing the URLs before clicking on them in the production or main environment. Such methods have two main drawbacks which are the cost of the sandboxing as well as the non-real-time response which is due to the approval process in the test environment. In this paper, we propose a method that exploits semantic features in both domains and URLs as well. The method is adaptive, meaning that the model can dynamically change based on the new feedback received on the 0-day attacks. We extract features from all sections of a URL separately. We then apply three methods of machine learning on three different sets of data. We provide an analysis of features on the most efficient value of N for applying the N-grams to the domain names. The result shows that Random Forest has the highest accuracy of over 96% and at the same time provides more interpretability as well as performance benefits.

show abstract

Replacing Human Input in Spam Email Detection Using Deep Learning

Nicho

Majdani

McDermott

2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The Covid-19 pandemic has been a driving force for a substantial increase in online activity and transactions across the globe. As a consequence, cyber-attacks, particularly those leveraging email as the preferred attack vector, have also increased exponentially since Q1 2020. Despite this, email remains a popular communication tool. Previously, in an effort to reduce the amount of spam entering a users inbox, many email providers started to incorporate spam filters into their products. However, many commercial spam filters rely on a human to train the filter, leaving a margin of risk if sufficient training has not occurred. In addition, knowing this, hackers employ more targeted and nuanced obfuscation methods to bypass in-built spam filters. In response to this continued problem, there is a growing body of research on the use of machine learning techniques for spam filtering. In many cases, detection results have shown great promise, but often still rely on human input to classify training datasets. In this study, we explore specifically the use of deep learning as a method of reducing human input required for spam detection. First, we evaluate the efficacy of popular spam detection methods/tools/techniques (freeware). Next, we narrow down machine learning techniques to select the appropriate method for our dataset. This was then compared with the accuracy of freeware spam detection tools to present our results. Our results showed that our deep learning model, based on simple word embedding and global max pooling (SWEM-max) had higher accuracy (98.41%) than both Thunderbird (95%) and Mailwasher (92%) which are based on Bayesian spam filtering. Finally, we postulate whether this improvement is enough to accept the removal of human input in spam email detection.

show abstract

PhishBlock: A hybrid anti-phishing tool

Cited by 10 publications

References 7 publications

URL Links Malicious Classification Towards Autonomous Threat Detection Systems

URL Links Malicious Classification Towards Autonomous Threat Detection Systems

Towards the Detection of Malicious URL and Domain Names Using Machine Learning

Replacing Human Input in Spam Email Detection Using Deep Learning

Contact Info

Product

Resources

About