Machine learning-based systems for malware detection operate in a hostile environment. Consequently, adversaries will also target the learning system and use evasion attacks to bypass the detection of malware. In this paper, we outline our learning-based system PEberus that got the first place in the defender challenge of the Microsoft Evasion Competition, resisting a variety of attacks from independent attackers. Our system combines multiple, diverse defenses: we address the semantic gap, use various classification models, and apply a stateful defense. This competition gives us the unique opportunity to examine evasion attacks under a realistic scenario. It also highlights that existing machine learning methods can be hardened against attacks by thoroughly analyzing the attack surface and implementing concepts from adversarial learning. Our defense can serve as an additional baseline in the future to strengthen the research on secure learning 1 .
Having content in an archive is of limited value if it cannot be read and used. As a case study of extricating information from obsolete media, making it readable once again through deep learning techniques, we examine the Cauzin Softstrip: one of the first two-dimensional bar codes, released in 1985 by Cauzin Systems, which could be used for encoding all manner of digital data. Softstrips occupy a curious middle ground, as they were both physical and digital. The bar codes were printed on paper, and in that sense are no different in an archival way than any printed material. Softstrips can be found in old computer magazines, computer books, and booklets of software Cauzin produced. However, managing the digital nature of these physical artifacts falls within the scope of digital curation. To make the information on them readable and useful, the digital information needs to be extracted, which originally would have occurred using a physical Cauzin Softstrip reader. Obtaining a working Softstrip reader is already extremely difficult and will most likely be impossible in the coming years. In order to extract the encoded data, we created a digital Softstrip reader, making Softstrip data accessible without needing a physical reader. Our decoding strategy is able to decode over 91% of the 1229 Softstrips in our Softstrip corpus; this rises to 99% if we only consider Softstrip images produced under controlled conditions. Furthermore, we later acquired another set of 117 Softstrips and we were able to decode nearly 95% of them with no adjustments to the decoder. These excellent results underscore the fact that technology like deep learning is readily accessible to non-experts; we obtained these results using a convolutional neural network, even though neither of the authors are expert in the area.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.