Recent progress in machine learning has led to promising results in behavioral malware detection. Behavioral modeling identifies malicious processes via features derived by their runtime behavior. Behavioral features hold great promise as they are intrinsically related to the functioning of each malware, and are therefore considered difficult to evade. Indeed, while a significant amount of results exists on evasion of static malware features, evasion of dynamic features has seen limited work. This paper examines the robustness of behavioral ransomware detectors to evasion and proposes multiple novel techniques to evade them. Ransomware behavior differs significantly from that of benign processes, making it an ideal best case for behavioral detectors, and a difficult candidate for evasion. We identify and propose a set of novel attacks that distribute the overall malware workload across a small set of independent, cooperating processes in order to avoid the generation of significant behavioral features. Our most effective attack decreases the accuracy of a state-of-the-art classifier from 98.6 to 0% using only 18 cooperating processes. Furthermore, we show our attacks to be effective against commercial ransomware detectors in a black-box setting. Finally, we evaluate a detector designed to identify our most effective attack, as well as discuss potential directions to mitigate our most advanced attack.
Several cybersecurity domains, such as ransomware detection, forensics and data analysis, require methods to reliably identify encrypted data fragments. Typically, current approaches employ statistics derived from byte-level distribution, such as entropy estimation, to identify encrypted fragments. However, modern content types use compression techniques which alter data distribution pushing it closer to the uniform distribution. The result is that current approaches exhibit unreliable encryption detection performance when compressed data appear in the dataset. Furthermore, proposed approaches are typically evaluated over few data types and fragment sizes, making it hard to assess their practical applicability. This paper compares existing statistical tests on a large, standardized dataset and shows that current approaches consistently fail to distinguish encrypted and compressed data on both small and large fragment sizes. We address these shortcomings and design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data. We evaluate EnCoD on a dataset of 16 different file types and fragment sizes ranging from 512B to 8KB. Our results highlight that EnCoD outperforms current approaches by a wide margin, with accuracy ranging from $$\sim 82\%$$ ∼ 82 % for 512B fragments up to $$\sim 92\%$$ ∼ 92 % for 8KB data fragments. Moreover, EnCoD can pinpoint the exact format of a given data fragment, rather than performing only binary classification like previous approaches.
Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency. Compression algorithms also output high-entropy data, thus reducing the accuracy of entropy-based encryption detectors.Over the years, a variety of approaches have been proposed to distinguish encrypted file fragments from high-entropy compressed fragments. However, these approaches are typically only evaluated over a few, selected data types and fragment sizes, which makes a fair assessment of their practical applicability impossible. This paper aims to close this gap by comparing existing statistical tests on a large, standardized dataset. Our results show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes. To address this issue, we design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes. We evaluate EnCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types.
Training high-quality deep learning models is a challenging task due to computational and technical requirements. A growing number of individuals, institutions, and companies increasingly rely on pre-trained, third-party models made available in public repositories. These models are often used directly or integrated in product pipelines with no particular precautions, since they are effectively just data in tensor form and considered safe.In this paper, we raise awareness of a new machine learning supply chain threat targeting neural networks. We introduce MaleficNet 2.0, a novel technique to embed self-extracting, selfexecuting malware in neural networks. MaleficNet 2.0 uses spread-spectrum channel coding combined with error correction techniques to inject malicious payloads in the parameters of deep neural networks. MaleficNet 2.0 injection technique is stealthy, does not degrade the performance of the model, and is robust against removal techniques. We design our approach to work both in traditional and distributed learning settings such as Federated Learning, and demonstrate that it is effective even when a reduced number of bits is used for the model parameters. Finally, we implement a proof-of-concept self-extracting neural network malware using MaleficNet 2.0, demonstrating the practicality of the attack against a widely adopted machine learning framework.Our aim with this work is to raise awareness against these new, dangerous attacks both in the research community and industry, and we hope to encourage further research in mitigation techniques against such threats.
Ransomware is a rapidly evolving type of malware designed to encrypt user files on a device, making them inaccessible in order to exact a ransom. Ransomware attacks resulted in billions of dollars in damages in recent years and are expected to cause hundreds of billions more in the next decade. With current state-of-the-art process-based detectors being heavily susceptible to evasion attacks, no comprehensive solution to this problem is available today. This paper presents Minerva, a new approach to ransomware detection. Unlike current methods focused on identifying ransomware based on process-level behavioral modeling, Minerva detects ransomware by building behavioral profiles of files based on all the operations they receive in a time window. Minerva addresses some of the critical challenges associated with process-based approaches, specifically their vulnerability to complex evasion attacks. Our evaluation of Minerva demonstrates its effectiveness in detecting ransomware attacks, including those that are able to bypass existing defenses. Our results show that Minerva identifies ransomware activity with an average accuracy of 99.45% and an average recall of 99.66%, with 99.97% of ransomware detected within 1 second.
Recent advances in generative machine learning models rekindled research interest in the area of password guessing. Data-driven password guessing approaches based on GANs, language models and deep latent variable models show impressive generalization performance and offer compelling properties for the task of password guessing.In this paper, we propose a flow-based generative model approach to password guessing. Flow-based models allow for precise log-likelihood computation and optimization, which enables exact latent variable inference. Additionally, flowbased models provide meaningful latent space representation, which enables operations such as exploration of specific subspaces of the latent space and interpolation. We demonstrate the applicability of generative flows to the context of password guessing, departing from previous applications of flow networks which are mainly limited to the continuous space of image generation. We show that the above-mentioned properties allow flow-based models to outperform deep latent variable model approaches and remain competitive with state-of-the-art GANs in the password guessing task, while using a training set that is orders of magnitudes smaller than that of previous art. Furthermore, a qualitative analysis of the generated samples shows that flow-based networks are able to accurately model the original passwords distribution, with even non-matched samples closely resembling human-like passwords.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.