The evolution of mobile malware poses a serious threat to smartphone security. Today, sophisticated attackers can adapt by maximally sabotaging machine-learning classifiers via polluting training data, rendering most recent machine learning-based malware detection tools (such as Drebin, DroidAPIMiner, and MaMaDroid) ineffective. In this paper, we explore the feasibility of constructing crafted malware samples; examine how machine-learning classifiers can be misled under three different threat models; then conclude that injecting carefully crafted data into training data can significantly reduce detection accuracy. To tackle the problem, we propose KuafuDet, a two-phase learning enhancing approach that learns mobile malware by adversarial detection. KuafuDet includes an offline training phase that selects and extracts features from the training set, and an online detection phase that utilizes the classifier trained by the first phase. To further address the adversarial environment, these two phases are intertwined through a self-adaptive learning scheme, wherein an automated camouflage detector is introduced to filter the suspicious false negatives and feed them back into the training phase. We finally show that KuafuDet can significantly reduce false negatives and boost the detection accuracy by at least 15%. Experiments on more than 250,000 mobile applications demonstrate that KuafuDet is scalable and can be highly effective as a standalone system.
Abstract-String similarity join is an essential operation in data integration. The era of big data calls for scalable algorithms to support large-scale string similarity joins. In this paper, we study scalable string similarity joins using MapReduce. We propose a MapReduce-based framework, called MASSJOIN, which supports both set-based similarity functions and characterbased similarity functions. We extend the existing partition-based signature scheme to support set-based similarity functions. We utilize the signatures to generate key-value pairs. To reduce the transmission cost, we merge key-value pairs to significantly reduce the number of key-value pairs, from cubic to linear complexity, while not sacrificing the pruning power. To improve the performance, we incorporate "light-weight" filter units into the key-value pairs which can be utilized to prune large number of dissimilar pairs without significantly increasing the transmission cost. Experimental results on real-world datasets show that our method significantly outperformed state-of-the-art approaches.
Attackers often use URLs to advertise scams or propagate malware. Because the reputation of a domain can be used to identify malicious behavior, miscreants often register these domains "just in time" before an attack. This paper explores the DNS behavior of attack domains, as identified by appearance in a spam trap, shortly after the domains were registered. We explore the behavioral properties of these domains from two perspectives: (1) the DNS infrastructure associated with the domain, as is observable from the resource records; and (2) the DNS lookup patterns from networks who are looking up the domains initially. Our analysis yields many findings that may ultimately be useful for early detection of malicious domains. By monitoring the infrastructure for these malicious domains, we find that about 55% of scam domains occur in attacks at least one day after registration, suggesting the potential for early discovery of malicious domains, solely based on properties of the DNS infrastructure that resolves those domains. We also find that there are a few regions of IP address space that host name servers and other types of servers for only malicious domains. Malicious domains have resource records that are distributed more widely across IP address space, and they are more quickly looked up by a variety of different networks. We also identify a set of "tainted" ASes that are used heavily by bad domains to host resource records. The features we observe are often evident before any attack even takes place; ultimately, they might serve as the basis for a DNS-based early warning system for attacks.
Spammers register a tremendous number of domains to evade blacklisting and takedown efforts. Current techniques to detect such domains rely on crawling spam URLs or monitoring lookup traffic. Such detection techniques are only effective after the spammers have already launched their campaigns, and thus these countermeasures may only come into play after the spammer has already reaped significant benefits from the dissemination of large volumes of spam. In this paper we examine the registration process of such domains, with a particular eye towards features that might indicate that a given domain likely has a malicious purpose at registration time, before it is ever used for an attack. Our assessment includes exploring the characteristics of registrars, domain life cycles, registration bursts, and naming patterns. By investigating zone changes from the .com TLD over a 5-month period, we discover that spammers employ bulk registration, that they often re-use domains previously registered by others, and that they tend to register and host their domains over a small set of registrars. Our findings suggest steps that registries or registrars could use to frustrate the efforts of miscreants to acquire domains in bulk, ultimately reducing their agility for mounting large-scale attacks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.