Recent research has repeatedly shown that machine learning techniques can be applied to either whole files or file fragments to classify them for analysis. We build upon these techniques to show that for samples of un-labeled compiled computer object code, one can apply the same type of analysis to classify important aspects of the code, such as its target architecture and endianess. We show that using simple byte-value histograms we retain enough information about the opcodes within a sample to classify the target architecture with high accuracy, and then discuss heuristic-based features that exploit information within the operands to determine endianess. We introduce a dataset with over 16000 code samples from 20 architectures and experimentally show that by using our features, classifiers can achieve very high accuracy with relatively small sample sizes. MotivationDigital forensics remains largely a manual process requiring detailed and time consuming analysis by experts within the field. In particular, the analysis of computer executables, either for forensic analysis, reverse engineering, or malware detection, remains a time consuming task as the level or expertise needed to understand compiled object code is quite high. Additionally, the explosion of different types of devices (cell phones, complex routers, smart sensors, the internet of things (IoT)) means that experts are no longer dealing with just one computing architecture, but instead are seeing a myriad of executable code (firmware, mobile apps, etc.) traversing their networks and showing up in forensic and malware samples. Even generic desktop workstations contain object code for architectures other than the main CPU. These can include GPU-enabled programs, firmware for network cards and other devices which contain embedded CPUs (Blanco and Eissler [2012], Delugré [2010]), management co-processors (Miller [2011]), and USB drivers for devices that contain their own processors for services like data compression or encryption. The object code for these devices is often stored in files with non-standard headers or embedded inside driver object files. Analysts are seeking tools to jump-start the analysis process by automatically labeling unknown samples.
Software integrity measurement and attestation (M&A) are critical technologies for evaluating the trustworthiness of software platforms. To best support these technologies, next generation systems must provide a centralized service for securely selecting, collecting, and evaluating integrity measurements. Centralization of M&A avoids duplication, minimizes security risks to the system, and ensures correct administration of integrity policies and systems. This paper details the desirable features and properties of such a system, and introduces Maat, a prototype implementation of an M&A service that meets these properties. Maat is a platform service that provides a centralized policy-driven framework for determining which measurement tools and protocols to use to meet the needs of a given integrity evaluation. Maat simplifies the task of integrating integrity measurements into a range of larger trust decisions such as authentication, network access control, or delegated computations.
Recurrent neural networks (RNNs) are powerful constructs capable of modeling complex systems, up to and including Turing Machines. However, learning such complex models from finite training sets can be difficult. In this paper we empirically show that RNNs can learn models of computer peripheral devices through input and output state observation. This enables automated development of functional software-only models of hardware devices. Such models are applicable to any number of tasks, including device validation, driver development, code de-obfuscation, and reverse engineering. We show that the same RNN structure successfully models six different devices from simple test circuits up to a 16550 UART serial port, and verify that these models are capable of producing equivalent output to real hardware.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.