Phishing is an increasingly sophisticated method to steal personal user information using sites that pretend to be legitimate. In this paper, we take the following steps to identify phishing URLs. First, we carefully select lexical features of the URLs that are resistant to obfuscation techniques used by attackers. Second, we evaluate the classification accuracy when using only lexical features, both automatically and hand-selected, vs. when using additional features. We show that lexical features are sufficient for all practical purposes. Third, we thoroughly compare several classification algorithms, and we propose to use an online method (AROW) that is able to overcome noisy training data. Based on the insights gained from our analysis, we propose PhishDef, a phishing detection system that uses only URL names and combines the above three elements. PhishDef is a highly accurate method (when compared to state-of-the-art approaches over real datasets), lightweight (thus appropriate for online and client-side deployment), proactive (based on online classification rather than blacklists), and resilient to training data inaccuracies (thus enabling the use of large noisy training data).
We consider the scenario of broadcasting for real-time applications and loss recovery via instantly decodable network coding. Past work focused on minimizing the completion delay, which is not the right objective for real-time applications that have strict deadlines. In this work, we are interested in finding a code that is instantly decodable by the maximum number of users. First, we prove that this problem is NP-Hard in the general case. Then we consider the practical probabilistic scenario, where users have i.i.d. loss probability and the number of packets is linear or polynomial in the number of users. In this scenario, we provide a polynomial-time (in the number of users) algorithm that finds the optimal coded packet. The proposed algorithm is evaluated using both simulation and real network traces of a real-time Android application. Both results show that the proposed coding scheme significantly outperforms the state-of-the-art baselines: an optimal repetition code and a COPE-like greedy scheme.
In the past century, forensic investigators have universally accepted fingerprinting as a reliable identification method via pictorial comparison. One of the most traditional detection methods uses ninhydrin, a chemical that reacts with amino acids in the fingerprint content to produce the blue-purple color known as Ruhemann's purple. It has recently been demonstrated that the amino acid content in fingerprints can be used to differentiate between male and female fingerprints. Here, we present a modified approach to the traditional ninhydrin method. This new approach for using ninhydrin is combined with an optimized extraction protocol and the concept of determining gender from fingerprints. In doing so, we are able to focus on the biochemical material rather than exclusively the physical image.
A widely used defense practice against malicious traffic on the Internet is through blacklists: lists of prolific attack sources are compiled and shared. The goal of blacklists is to predict and block future attack sources. Existing blacklisting techniques have focused on the most prolific attack sources and, more recently, on collaborative blacklisting. In this paper, we formulate the problem of forecasting attack sources (also referred to as "predictive blacklisting") based on shared attack logs as an implicit recommendation system. We compare the performance of existing approaches against the upper bound for prediction, and we demonstrate that there is much room for improvement. Inspired by the recent Netflix competition, we propose a multilevel prediction model that is adjusted and tuned specifically for the attack forecasting problem. Our model captures and combines various factors, namely: attacker-victim history (using time-series) and attackers and/or victims interactions (using neighborhood models). We evaluate our combined method on one month of logs from Dshield.org and demonstrate that it improves significantly the state-of-the-art.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.