Reverse engineering is a manually intensive but necessary technique for understanding the inner workings of new malware, finding vulnerabilities in existing systems, and detecting patent infringements in released software. An assembly clone search engine facilitates the work of reverse engineers by identifying those duplicated or known parts. However, it is challenging to design a robust clone search engine, since there exist various compiler optimization options and code obfuscation techniques that make logically similar assembly functions appear to be very different.A practical clone search engine relies on a robust vector representation of assembly code. However, the existing clone search approaches, which rely on a manual feature engineering process to form a feature vector for an assembly function, fail to consider the relationships between features and identify those unique patterns that can statistically distinguish assembly functions. To address this problem, we propose to jointly learn the lexical semantic relationships and the vector representation of assembly functions based on assembly code. We have developed an assembly code representation learning model Asm2Vec. It only needs assembly code as input and does not require any prior knowledge such as the correct mapping between assembly functions. It can find and incorporate rich semantic relationships among tokens appearing in assembly code. We conduct extensive experiments and benchmark the learning model with state-of-the-art static and dynamic clone search approaches. We show that the learned representation is more robust and significantly outperforms existing methods against changes introduced by obfuscation and optimizations.
Sequence diagrams can be valuable aids to software understanding. However, they can be extremely large and hard to understand in spite of using modern tool support. Consequently, providing the right set of tool features is important if the tools are to help rather than hinder the user. This paper surveys research and commercial sequence diagram tools to determine the features they provide to support program understanding. Although there has been significant effort in developing these tools, many of them have not been evaluated using human subjects. To begin to address this gap, a preliminary study was performed with a specially designed sequence diagram tool that implements the features found during the survey. On the basis of an analysis of the study results, we discuss the features that were found to be useful and relate these to the tasks performed. It concludes by proposing how future tools can be improved to better support the exploration of large sequence diagrams. Copyright © 2008 Crown in the right of Canada. Published by John Wiley & Sons, Ltd.
Finding lines of code similar to a code fragment across large knowledge bases in fractions of a second is a new branch of code clone research also known as real-time code clone search. Among the requirements real-time code clone search has to meet are scalability, short response time, scalable incremental corpus updates, and support for type-1, type-2, and type-3 clones. We conducted a set of empirical studies on a large open source code corpus to gain insight about its characteristics. We used these results to design and optimize a multi-level indexing approach using hash table-based and binary search to improve Internet-scale real-time code clone search response time. Finally, we performed an evaluation on an Internet-scale corpus (1.5 million Java files and 266 MLOC). Our approach maintains a response time for 99.9% of clone searches in the microseconds range, while supporting the aforementioned requirements.
No abstract
Comprehension is an essential part of software evolution. Only software that is well understood can evolve in a controlled manner. In this paper, we present a formal process model to support the comprehension of software systems by using Ontology and Description Logic. This formal representation supports the use of reasoning services across different knowledge resources and therefore, enables us to provide users with guidance during the comprehension process that is context sensitive to their particular comprehension task. As part of the process model, we also adopt a new interactive story metaphor, to represent the interactions between users and the comprehension process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.