John Clemens scite author profile

John Clemens

4Publications

101Citation Statements Received

53Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Maryland, Baltimore County, Johns Hopkins University Applied Physics Laboratory

Publications

Order By: Most citations

Automatic classification of object code using machine learning

Clemens

2015

Digital Investigation

View full text Add to dashboard Cite

Recent research has repeatedly shown that machine learning techniques can be applied to either whole files or file fragments to classify them for analysis. We build upon these techniques to show that for samples of un-labeled compiled computer object code, one can apply the same type of analysis to classify important aspects of the code, such as its target architecture and endianess. We show that using simple byte-value histograms we retain enough information about the opcodes within a sample to classify the target architecture with high accuracy, and then discuss heuristic-based features that exploit information within the operands to determine endianess. We introduce a dataset with over 16000 code samples from 20 architectures and experimentally show that by using our features, classifiers can achieve very high accuracy with relatively small sample sizes. MotivationDigital forensics remains largely a manual process requiring detailed and time consuming analysis by experts within the field. In particular, the analysis of computer executables, either for forensic analysis, reverse engineering, or malware detection, remains a time consuming task as the level or expertise needed to understand compiled object code is quite high. Additionally, the explosion of different types of devices (cell phones, complex routers, smart sensors, the internet of things (IoT)) means that experts are no longer dealing with just one computing architecture, but instead are seeing a myriad of executable code (firmware, mobile apps, etc.) traversing their networks and showing up in forensic and malware samples. Even generic desktop workstations contain object code for architectures other than the main CPU. These can include GPU-enabled programs, firmware for network cards and other devices which contain embedded CPUs (Blanco and Eissler [2012], Delugré [2010]), management co-processors (Miller [2011]), and USB drivers for devices that contain their own processors for services like data compression or encryption. The object code for these devices is often stored in files with non-standard headers or embedded inside driver object files. Analysts are seeking tools to jump-start the analysis process by automatically labeling unknown samples.

show abstract

Runtime State Verification on Resource-Constrained Platforms

Clemens¹,

Pal²,

Sherrell³

2018

View full text Add to dashboard Cite

A Platform Service for Remote Integrity Measurement and Attestation

Pcnderarass

Helble

Clemens

et al. 2018

View full text Add to dashboard Cite

Software integrity measurement and attestation (M&A) are critical technologies for evaluating the trustworthiness of software platforms. To best support these technologies, next generation systems must provide a centralized service for securely selecting, collecting, and evaluating integrity measurements. Centralization of M&A avoids duplication, minimizes security risks to the system, and ensures correct administration of integrity policies and systems. This paper details the desirable features and properties of such a system, and introduces Maat, a prototype implementation of an M&A service that meets these properties. Maat is a platform service that provides a centralized policy-driven framework for determining which measurement tools and protocols to use to meet the needs of a given integrity evaluation. Maat simplifies the task of integrating integrity measurements into a range of larger trust decisions such as authentication, network access control, or delegated computations.

show abstract

Learning Device Models with Recurrent Neural Networks

Clemens

2018

View full text Add to dashboard Cite

Recurrent neural networks (RNNs) are powerful constructs capable of modeling complex systems, up to and including Turing Machines. However, learning such complex models from finite training sets can be difficult. In this paper we empirically show that RNNs can learn models of computer peripheral devices through input and output state observation. This enables automated development of functional software-only models of hardware devices. Such models are applicable to any number of tasks, including device validation, driver development, code de-obfuscation, and reverse engineering. We show that the same RNN structure successfully models six different devices from simple test circuits up to a 16550 UART serial port, and verify that these models are capable of producing equivalent output to real hardware.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

John Clemens

Automatic classification of object code using machine learning

Runtime State Verification on Resource-Constrained Platforms

A Platform Service for Remote Integrity Measurement and Attestation

Learning Device Models with Recurrent Neural Networks

Contact Info

Product

Resources

About