Philippe Charland scite author profile

Reverse engineering is a manually intensive but necessary technique for understanding the inner workings of new malware, finding vulnerabilities in existing systems, and detecting patent infringements in released software. An assembly clone search engine facilitates the work of reverse engineers by identifying those duplicated or known parts. However, it is challenging to design a robust clone search engine, since there exist various compiler optimization options and code obfuscation techniques that make logically similar assembly functions appear to be very different.A practical clone search engine relies on a robust vector representation of assembly code. However, the existing clone search approaches, which rely on a manual feature engineering process to form a feature vector for an assembly function, fail to consider the relationships between features and identify those unique patterns that can statistically distinguish assembly functions. To address this problem, we propose to jointly learn the lexical semantic relationships and the vector representation of assembly functions based on assembly code. We have developed an assembly code representation learning model Asm2Vec. It only needs assembly code as input and does not require any prior knowledge such as the correct mapping between assembly functions. It can find and incorporate rich semantic relationships among tokens appearing in assembly code. We conduct extensive experiments and benchmark the learning model with state-of-the-art static and dynamic clone search approaches. We show that the learned representation is more robust and significantly outperforms existing methods against changes introduced by obfuscation and optimizations.

show abstract

A survey and evaluation of tool features for understanding reverse‐engineered sequence diagrams

Bennett

Myers

Storey

et al. 2008

J. Softw. Maint. Evol.: Res. Pract.

View full text Add to dashboard Cite

Sequence diagrams can be valuable aids to software understanding. However, they can be extremely large and hard to understand in spite of using modern tool support. Consequently, providing the right set of tool features is important if the tools are to help rather than hinder the user. This paper surveys research and commercial sequence diagram tools to determine the features they provide to support program understanding. Although there has been significant effort in developing these tools, many of them have not been evaluated using human subjects. To begin to address this gap, a preliminary study was performed with a specially designed sequence diagram tool that implements the features found during the survey. On the basis of an analysis of the study results, we discuss the features that were found to be useful and relate these to the tasks performed. It concludes by proposing how future tools can be improved to better support the exploration of large sequence diagrams. Copyright © 2008 Crown in the right of Canada. Published by John Wiley & Sons, Ltd.

show abstract

Internet-scale Real-time Code Clone Search Via Multi-level Indexing

Keivanloo

Rilling

Charland

2011

View full text Add to dashboard Cite

Finding lines of code similar to a code fragment across large knowledge bases in fractions of a second is a new branch of code clone research also known as real-time code clone search. Among the requirements real-time code clone search has to meet are scalability, short response time, scalable incremental corpus updates, and support for type-1, type-2, and type-3 clones. We conducted a set of empirical studies on a large open source code corpus to gain insight about its characteristics. We used these results to design and optimize a multi-level indexing approach using hash table-based and binary search to improve Internet-scale real-time code clone search response time. Finally, we performed an evaluation on an Internet-scale corpus (1.5 million Java files and 266 MLOC). Our approach maintains a response time for 99.9% of clone searches in the microseconds range, while supporting the aforementioned requirements.

show abstract

BinClone: Detecting Code Clones in Malware

Farhadi

Fung

Charland

et al. 2014

View full text Add to dashboard Cite

A Context-Driven Software Comprehension Process Model

Meng

Rilling

Zhang

et al. 2006

View full text Add to dashboard Cite

Comprehension is an essential part of software evolution. Only software that is well understood can evolve in a controlled manner. In this paper, we present a formal process model to support the comprehension of software systems by using Ontology and Description Logic. This formal representation supports the use of reasoning services across different knowledge resources and therefore, enables us to provide users with guidance during the comprehension process that is context sensitive to their particular comprehension task. As part of the process model, we also adopt a new interactive story metaphor, to represent the interactions between users and the comprehension process.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.