2018
DOI: 10.48550/arxiv.1812.09652
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(13 citation statements)
references
References 27 publications
0
13
0
Order By: Relevance
“…The study of learning-based BCSD has been inspired by recent development in natural language processing (NLP) [39,45,56], which uses real-valued vectors called embeddings to encode semantic information of words and sentences. Building upon these techniques, previous studies [14,15,24,40,43,44,51,59,[61][62][63] applied deep learning methods to binary similarity detection. Shared by many of these studies is the idea of embedding binary functions into numerical vectors, and then using vector distance to approximate the similarity between different binary functions.…”
Section: Learning-based Bcsd Approachesmentioning
confidence: 99%
See 2 more Smart Citations
“…The study of learning-based BCSD has been inspired by recent development in natural language processing (NLP) [39,45,56], which uses real-valued vectors called embeddings to encode semantic information of words and sentences. Building upon these techniques, previous studies [14,15,24,40,43,44,51,59,[61][62][63] applied deep learning methods to binary similarity detection. Shared by many of these studies is the idea of embedding binary functions into numerical vectors, and then using vector distance to approximate the similarity between different binary functions.…”
Section: Learning-based Bcsd Approachesmentioning
confidence: 99%
“…𝛼Diff [40], for example, learns binary function embeddings directly from the sequence of raw bytes using convolutional neural network (CNN) [38]. INNEREYE [63] and RLZ2019 [51] regard instructions as words and basic blocks as sentences, and use word2vec [45] and LSTM [29] to learn basic block embeddings. SAFE [43] uses a similar approach to learn the embeddings of binary functions, while Gemini [59], VulSeeker [24], GraphEmb [44] and OrderMatters [62] use GNNs to build a graph embedding model for learning attributed control-flow graph (ACFG) of binary functions.…”
Section: Learning-based Bcsd Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…Asm2Vec [10] leverages a modified version of PV-DM model to solve the obfuscation and optimization issues in a clone search. Zuo et al [73] and Redmond et al [50] solve the binary similarity problem by NLP techniques when the same file is compiled in different architectures. SAFE [37] proposes a combination of skip-gram and RNN self-attention model to learn the embeddings of the functions from binary files to find the similarities.…”
Section: Binary Analysis With Embeddingmentioning
confidence: 99%
“…Natural Language Processing (NLP) techniques are applied to automate challenging tasks in natural language and text processing [48]. Later, NLP techniques have been applied for security as well, such as in network traffic [49] and vulnerability analysis [50]. Such applications leverage word [39] or paragraph [30] embedding techniques to learn the vector representations of the text.…”
Section: Introductionmentioning
confidence: 99%