Windows malware detectors based on machine learning are vulnerable to adversarial examples, even if the attacker is only given black-box query access to the model. The main drawback of these attacks is that: (i) they are query-inefficient, as they rely on iteratively applying random transformations to the input malware; and (i i) they may also require executing the adversarial malware in a sandbox at each iteration of the optimization process, to ensure that its intrusive functionality is preserved. In this paper, we overcome these issues by presenting a novel family of black-box attacks that are both query-efficient and functionality-preserving, as they rely on the injection of benign content (which will never be executed) either at the end of the malicious file, or within some newly-created sections. Our attacks are formalized as a constrained minimization problem which also enables optimizing the trade-off between the probability of evading detection and the size of the injected payload. We empirically investigate this trade-off on two popular static Windows malware detectors, and show that our black-box attacks can bypass them with only few queries and small payloads, even when they only return the predicted labels. We also evaluate whether our attacks transfer to other commercial antivirus solutions, and surprisingly find that they can evade, on average, more than 12 commercial antivirus engines. We conclude by discussing the limitations of our approach, and its possible future extensions to target malware classifiers based on dynamic analysis.
In this paper we present Jam, an extension of the Java language supporting mixins, that is, parametric heir classes. A mixin declaration in Jam is similar to a Java heir class declaration, except that it does not extend a fixed parent class, but simply specifies the set of fields and methods a generic parent should provide. In this way, the same mixin can be instantiated on many parent classes, producing different heirs, thus avoiding code duplication and largely improving modularity and reuse. Moreover, as happens for classes and interfaces, mixin names are reference types, and all the classes obtained by instantiating the same mixin are considered subtypes of the corresponding type, and hence can be handled in a uniform way through the common interface. This possibility allows a programming style where different ingredients are "mixed" together in defining a class; this paradigm is somewhat similar to that based on multiple inheritance, but avoids its complication.The language has been designed with the main objective in mind to obtain, rather than a new theoretical language, a working and smooth extension of Java. That means, on the design side, that we have faced the challenging problem of integrating the Java overall principles and complex type system with this new notion; on the implementation side, it means that we have developed a Jam-to-Java translator which makes Jam sources executable on every Java Virtual Machine.
We propose a novel approach based on coinductive logic to specify type systems of programming languages. The approach consists in encoding programs in Horn formulas which are interpreted w.r.t. their coinductive Herbrand model. We illustrate the approach by first specifying a standard type system for a small object-oriented language similar to Featherweight Java. Then we define an idealized type system for a variant of the language where type annotations can be omitted. The type system involves infinite terms and proof trees not representable in a finite way, thus providing a theoretical limit to type inference of object-oriented programs, since only sound approximations of the system can be implemented. Approximation is naturally captured by the notions of subtyping and subsumption; indeed, rather than increasing the expressive power of the system, as it usually happens, here subtyping is needed for approximating infinite non regular types and proof trees with regular ones.
Recent work has shown that adversarial Windows malware samples—referred to as adversarial EXE mples in this article—can bypass machine learning-based detection relying on static code analysis by perturbing relatively few input bytes. To preserve malicious functionality, previous attacks either add bytes to existing non-functional areas of the file, potentially limiting their effectiveness, or require running computationally demanding validation steps to discard malware variants that do not correctly execute in sandbox environments. In this work, we overcome these limitations by developing a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks based on practical, functionality-preserving manipulations to the Windows Portable Executable file format. These attacks, named Full DOS , Extend , and Shift , inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section. Our experimental results show that these attacks outperform existing ones in both white-box and black-box scenarios, achieving a better tradeoff in terms of evasion rate and size of the injected payload, while also enabling evasion of models that have been shown to be robust to previous attacks. To facilitate reproducibility of our findings, we open source our framework and all the corresponding attack implementations as part of the secml-malware Python library. We conclude this work by discussing the limitations of current machine learning-based malware detectors, along with potential mitigation strategies based on embedding domain knowledge coming from subject-matter experts directly into the learning process.
Recent work has shown that deep-learning algorithms for malware detection are also susceptible to adversarial examples, i.e., carefullycrafted perturbations to input malware that enable misleading classification. Although this has questioned their suitability for this task, it is not yet clear why such algorithms are easily fooled also in this particular application domain. In this work, we take a first step to tackle this issue by leveraging explainable machine-learning algorithms developed to interpret the black-box decisions of deep neural networks. In particular, we use an explainable technique known as feature attribution to identify the most influential input features contributing to each decision, and adapt it to provide meaningful explanations to the classification of malware binaries. In this case, we find that a recently-proposed convolutional neural network does not learn any meaningful characteristic for malware detection from the data and text sections of executable files, but rather tends to learn to discriminate between benign and malware samples based on the characteristics found in the file header. Based on this finding, we propose a novel attack algorithm that generates adversarial malware binaries by only changing few tens of bytes in the file header. With respect to the other state-of-the-art attack algorithms, our attack does not require injecting any padding bytes at the end of the file, and it is much more efficient, as it requires manipulating much fewer bytes.
In recent work we have proposed a novel approach to define idealized type systems for object-oriented languages, based on abstract compilation of programs into Horn formulas which are interpreted w.r.t. the coinductive (that is, the greatest) Herbrand model. In this paper we investigate how this approach can be applied also in the presence of imperative features. This is made possible by considering a natural translation of Static Single Assignment intermediate form programs into Horn formulas, where ϕ functions correspond to union types.
Abstract. We present FJig, a simple calculus where basic building blocks are classes in the style of Featherweight Java, declaring elds, methods and one constructor. However, inheritance has been generalized to the much more exible notion originally proposed in Bracha's Jigsaw framework. That is, classes play also the role of modules, that can be composed by a rich set of operators, all of which can be expressed by a minimal core. We keep the nominal approach of Java-like languages, that is, types are class names. However, a class is not necessarily a structural subtype of any class used in its dening expression. The calculus allows the encoding of a large variety of dierent mechanisms for software composition in class-based languages, including standard inheritance, mixin classes, traits and hiding. Hence, FJig can be used as a unifying framework for analyzing existing mechanisms and proposing new extensions.We provide two dierent semantics of an FJig program: attening and direct semantics. The dierence is analogous to that between two intuitive models to understand inheritance: the former where inherited methods are copied into heir classes, and the latter where member lookup is performed by ascending the inheritance chain. Here we address equivalence of these two views for a more sophisticated composition mechanism.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.