Proceedings 2019 Network and Distributed System Security Symposium 2019
DOI: 10.14722/ndss.2019.23492
|View full text |Cite
|
Sign up to set email alerts
|

Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

Abstract: Binary code analysis allows analyzing binary code without having access to the corresponding source code. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas and techniques from Natural Language Processing (NLP), a fruitful area focused on processing text of various natural languages. We notice that binary code analysis and NLP share many analogical topics, such as semantics extraction, classification, and code/text comparison. Thi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
199
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 158 publications
(208 citation statements)
references
References 57 publications
0
199
0
Order By: Relevance
“…A few works target cross-architecture binary code analysis [21], [62], [8], [67]. Some exploit the statistical aspects of code, rather than its semantics.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…A few works target cross-architecture binary code analysis [21], [62], [8], [67]. Some exploit the statistical aspects of code, rather than its semantics.…”
Section: Related Workmentioning
confidence: 99%
“…For example, Genius [21] and Gemini [62] use some manually selected statistical features (e.g., the number of constants) to represent basic blocks, but they ignore the meaning of instructions and the dependency between them, resulting in significant loss of semantic information. INNEREYE-BB [67] uses LSTM to encode each basic block into an embedding, but it needs to train a separate instruction embedding model for each architecture. Instead, we build a uniform cross-architecture instruction embedding model that tolerates the syntactic differences across architectures.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations