Proceedings 2018 Workshop on Binary Analysis Research 2018
DOI: 10.14722/bar.2018.23008
|View full text |Cite
|
Sign up to set email alerts
|

Evolving Exact Decompilation

Abstract: We introduce a novel technique for C decompilation that provides the correctness guarantees and readability properties essential for accurate and efficient binary analysis. Given a binary executable, an evolutionary search seeks a combination of source code excerpts from a "big code" database that can be recompiled to an executable that is byte-equivalent to the original binary. Byte-equivalence ensures that a successful decompilation fully reproduces the behavior, both intended and unintended, of the original… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 24 publications
(26 citation statements)
references
References 23 publications
0
23
0
Order By: Relevance
“…Soundness is addressed by two recent works. Schulte et al [18] use search-based techniques to generate source-code producing byte-equivalent binaries to the original executable. This technique, when it succeeds, ensures soundness by design but it is only applied to small examples, with limited success.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Soundness is addressed by two recent works. Schulte et al [18] use search-based techniques to generate source-code producing byte-equivalent binaries to the original executable. This technique, when it succeeds, ensures soundness by design but it is only applied to small examples, with limited success.…”
Section: Related Workmentioning
confidence: 99%
“…In the end, TINA roughly generates one C statement per assembly instruction for the chunks of the Debian distribution. 13 "movq (%1, %0), %%mm1 \n\t" // T 14 "movq -1(%2, %0), %%mm2 \n\t" // L 15 "movq (%2, %0), %%mm3 \n\t" // X 16 "movq %%mm2, %%mm4 \n\t" // L "psubb %%mm0, %%mm2 \n\t" 18 "paddb %%mm1, %%mm2 \n\t" // L + T -LT 19 "movq %%mm4, %%mm5 \n\t" // L 20 "pmaxub %%mm1, %%mm4 \n\t" // max(T, L) 21 "pminub %%mm5, %%mm1 \n\t" // min(T, L) 22 "pminub %%mm2, %%mm4 \n\t" 23 "pmaxub %%mm1, %%mm4 \n\t" 24 "psubb %%mm4, %%mm3 \n\t" // dst -pred 25 "movq %%mm3, (%3, %0) \n\t" 26 "add $8, %0 \n\t" 27 "movq -1(%1, %0), %%mm0 \n\t" // LT 28 "cmp %4, %0 \n\t" 29 " jb 1b \n\t" 30 : "+r" (i) 31 : "r" (src1), "r" (src2), 32 "r" (dst), "r" ((x86 _ reg) w)); Sec. VII-D refers to a ffmpeg function accessing index −1 of its input buffer.…”
Section: Appendix C Additional Experiments: Size Of Produced Codementioning
confidence: 99%
See 1 more Smart Citation
“…One obvious way to circumvent the problem of missing sources, is to decompile all such functions back to source code. There have been many attempts at decompiling assembly code to source code, both specifically to timing analysis and as a general tool [17]. However, the decompiled code is typically less precise than the original, due to the information that is lost during compilation [9].…”
Section: The Decompilation Workaroundmentioning
confidence: 99%
“…However, this puts more burden on the decompiler, terms of how it is allowed to reconstruct source control flows from binary ones. Moreover, the decompiler must be sound itself, which appears to be difficult with available tools [17].…”
Section: Comparison Of Approachesmentioning
confidence: 99%