Proceedings of the Eighteenth International Symposium on Software Testing and Analysis 2009
DOI: 10.1145/1572272.1572287
|View full text |Cite
|
Sign up to set email alerts
|

Detecting code clones in binary executables

Abstract: Large software projects contain significant code duplication, mainly due to copying and pasting code. Many techniques have been developed to identify duplicated code to enable applications such as refactoring, detecting bugs, and protecting intellectual property. Because source code is often unavailable, especially for third-party software, finding duplicated code in binaries becomes particularly important. However, existing techniques operate primarily on source code, and no effective tool exists for binaries… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
77
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 135 publications
(79 citation statements)
references
References 24 publications
2
77
0
Order By: Relevance
“…Semantics is the only characteristic guaranteed to be preserved across the obfuscation. Static Analysis Based Plagiarism Detection: The existing static analysis techniques except for the birthmarkbased techniques are closely related to the clone detection [1,3,18,19,16,12,15,14,31]. While possessing common interests with the clone detection, the plagiarism detection is different in that (1) we must deal with code obfuscation techniques which are often employed with a malicious intention; (2) source code analysis of the suspicious program is not possible in most cases.…”
Section: State Of the Artmentioning
confidence: 99%
See 1 more Smart Citation
“…Semantics is the only characteristic guaranteed to be preserved across the obfuscation. Static Analysis Based Plagiarism Detection: The existing static analysis techniques except for the birthmarkbased techniques are closely related to the clone detection [1,3,18,19,16,12,15,14,31]. While possessing common interests with the clone detection, the plagiarism detection is different in that (1) we must deal with code obfuscation techniques which are often employed with a malicious intention; (2) source code analysis of the suspicious program is not possible in most cases.…”
Section: State Of the Artmentioning
confidence: 99%
“…The existing birthmark-based schemes are vulnerable to either obfuscation techniques mentioned in [23] or some well-known obfuscation such as statement reordering and junk instruction insertion. Moreover, all existing techniques except for [23,31] need to access source code. Dynamic Analysis Based Plagiarism Detection: Myles and Collberg [24] proposed a whole program path (WPP) based dynamic birthmark.…”
Section: State Of the Artmentioning
confidence: 99%
“…Saebjornsen et al [25] developed a clone detection technique that performs inexact matching of binary code using an instruction-based representation similar to our idiom features. While the mechanics of clone detection and provenance recovery are similar, the goals are orthogonal: clone detection seeks to find code instances with similar functionality and as the authors note is hampered by compiler-introduce variations; provenance recovery tools must ignore patterns due to program functionality and focus on the compiler or other toolchain components.…”
Section: Related Workmentioning
confidence: 99%
“…Existing source code clone detection techniques include String-based [5], Tree-based [6,18], Token-based [19,31,29] and PDG-based [21,14,22]. Saebjørnsen et al [30] proposed a tree-based clone detection in binary code. Most clone detection techniques do not take code obfuscation into consideration.…”
Section: Clone Detectionmentioning
confidence: 99%