Real-Time Detection of Malware Downloads via Large-Scale URL-&gt;File-&gt;Machine Graph Mining

Rahbarinia, Babak; Balduzzi, Marco; Perdisci, Roberto

doi:10.1145/2897845.2897918

Cited by 21 publications

(12 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The core idea of graphbased malware detection is modeling the interactions among malware, endpoints and network servers as graphs, and leverage various machine learning models to understand the patterns and detect previous unknown malicious files or activities. For example, CAMP [60], Mastino [59], and Polonium [48] built graphs from binary activity data and detect malware. Similarly, Marmite [69], NAZCA [32], AESOP [72] and Kwon et al [38] built graphs from binary download/distribution data and detect previous unknown malware.…”

Section: Related Workmentioning

confidence: 99%

ANDRUSPEX : Leveraging Graph Representation Learning to Predict Harmful App Installations on Mobile Devices

Shen

Stringhini

2021

Preprint

View full text Add to dashboard Cite

Android's security model severely limits the capabilities of anti-malware software. Unlike commodity antimalware solutions on desktop systems, their Android counterparts run as sandboxed applications without root privileges and are limited by Android's permission system. As such, PHAs on Android are usually willingly installed by victims, as they come disguised as useful applications with hidden malicious functionality, and are encountered on mobile app stores as suggestions based on the apps that a user previously installed. Users with similar interests and app installation history are likely to be exposed and to decide to install the same PHA. This observation gives us the opportunity to develop predictive approaches that can warn the user about which PHAs they will encounter and potentially be tempted to install in the near future. These approaches could then be used to complement commodity anti-malware solutions, which are focused on post-fact detection, closing the window of opportunity that existing solutions suffer from. In this paper we develop ANDRUSPEX, a system based on graph representation learning, allowing us to learn latent relationships between user devices and PHAs and leverage them for prediction. We test ANDRUSPEX on a real world dataset of PHA installations collected by a security company, and show that our approach achieves very high prediction results (up to 0.994 TPR at 0.0001 FPR), while at the same time outperforming alternative baseline methods. We also demonstrate that ANDRUSPEX is robust and its runtime performance is acceptable for a real world deployment.

show abstract

Section: Related Workmentioning

confidence: 99%

ANDRUSPEX : Leveraging Graph Representation Learning to Predict Harmful App Installations on Mobile Devices

Shen

Stringhini

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…5 shows the basic idea of the procedure described above, which underscores the file co-occurring as well as the hostfile relationship. We use (14) to update the labels of each data object until convergence. Convergence means that the predicted labels of the data will not change in several successive iterations.…”

Section: B Homophilic Host-file Relationship Based On File Co-occurrencementioning

confidence: 99%

“…We will do so by transforming (1 − α) I − α d W −1 into a standard symmetrical matrix. Rewrite (14) as…”

Section: B Homophilic Host-file Relationship Based On File Co-occurrencementioning

confidence: 99%

Malware Detection via Extended Label Propagation Through Graph Inference

2019

IEEE Access

View full text Add to dashboard Cite

In this paper, we model the malware detection problem as a graph inference problem, and develop a novel belief propagation approach within a semi-supervised learning scheme that fully makes use of files' and hosts' connections to detect malware. Specifically, with network download data, we build a large graph that depicts files' co-occurrence and files-hosts relationship. Different from the classical methods that heuristically define edge weights only in the file co-occurrence graph, we develop a new method to integrate homophilic host-file relationship on top of file co-occurrences. Then, by using the linear neighborhood model, we first perform propagations in the subgraph of files to achieve their stabilization, then extend the propagation to the complete file-host graph. To facilitate this propagation procedure, we develop a set of algorithmic tools that extract information for the linear neighborhood model from the link structure of download events. Also, we theoretically show that, under some mild conditions, our propagation method could reveal the actual labels of unlabeled nodes in the complete graph. Finally, we perform a set of experiments that demonstrate the effectiveness of our new method in a variety of contexts on a real-world dataset.

show abstract

“…Zhang and Shen [22] employ a statistical learning based approach to reduce false positives on IDSs. Rahbarinia et al [23] use graph mining techniques for analyzing download events for detecting malware download. These works are concentrated on improving an IDS, and present interesting discussions that could complement SADF in a possible future work integrating IDS into its architecture.…”

Section: Related Workmentioning

confidence: 99%

A new approach to deploy a self-adaptive distributed firewall

Júnior

Silva

Pinheiro

et al. 2018

J Internet Serv Appl

View full text Add to dashboard Cite

Distributed firewall systems emerged with the proposal of protecting individual hosts against attacks originating from inside the network. In these systems, firewall rules are centrally created, then distributed and enforced on all servers that compose the firewall, restricting which services will be available. However, this approach lacks protection against software vulnerabilities that can make network services vulnerable to attacks, since firewalls usually do not scan application protocols. In this sense, from the discovery of any vulnerability until the publication and application of patches there is an exposure window that should be reduced. In this context, this article presents Self-Adaptive Distributed Firewall (SADF). Our approach is based on monitoring hosts and using a vulnerability assessment system to detect vulnerable services, integrated with components capable of deciding and applying firewall rules on affected hosts. In this way, SADF can respond to vulnerabilities discovered in these hosts, helping to mitigate the risk of exploiting the vulnerability. Our system was evaluated in the context of a simulated network environment, where the results achieved demonstrate its viability.

show abstract

Real-Time Detection of Malware Downloads via Large-Scale URL->File->Machine Graph Mining

Cited by 21 publications

References 13 publications

ANDRUSPEX : Leveraging Graph Representation Learning to Predict Harmful App Installations on Mobile Devices

ANDRUSPEX : Leveraging Graph Representation Learning to Predict Harmful App Installations on Mobile Devices

Malware Detection via Extended Label Propagation Through Graph Inference

A new approach to deploy a self-adaptive distributed firewall

Contact Info

Product

Resources

About