2020
DOI: 10.1109/access.2020.3004813
|View full text |Cite
|
Sign up to set email alerts
|

A Lightweight Cross-Version Binary Code Similarity Detection Based on Similarity and Correlation Coefficient Features

Abstract: The technique of binary code similarity detection (BCSD) has been applied in many fields, such as malware detection, plagiarism detection and vulnerability search, etc. Existing solutions for the BCSD problem usually compare specific features between binaries based on the control flow graphs of functions from binaries or compute the embedding vector of binary functions and solve the problem based on deep learning algorithms. In this paper, from another research perspective, we propose a new and lightweight met… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 26 publications
0
4
0
Order By: Relevance
“…We particularly focus on features and datasets used in those studies, which lead us to four underexplored research questions that we will discuss in §2.3; our goal is to investigating these research questions by conducting a series of rigorous experiments. Because of the space limit, we excluded papers [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53] that were published before 2014 and those not regarding top-tier venues, or binary diffing tools [54], [55], [56] used in the industry. Additionally, we excluded papers that aimed to address a specific research problem such as malware detection, library function identification, or patch identification.…”
Section: Scopementioning
confidence: 99%
See 1 more Smart Citation
“…We particularly focus on features and datasets used in those studies, which lead us to four underexplored research questions that we will discuss in §2.3; our goal is to investigating these research questions by conducting a series of rigorous experiments. Because of the space limit, we excluded papers [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53] that were published before 2014 and those not regarding top-tier venues, or binary diffing tools [54], [55], [56] used in the industry. Additionally, we excluded papers that aimed to address a specific research problem such as malware detection, library function identification, or patch identification.…”
Section: Scopementioning
confidence: 99%
“…Program properties can also be directly used as a feature. The most straightforward approach involves directly comparing the raw bytes of binaries [6], [53], [75]. However, people tend to not consider this approach because byte-level matching is not as robust compared to simple code modifications.…”
Section: Presemantic Featuresmentioning
confidence: 99%
“…Recently, Gu et al. established a Malware detection method using transformer‐based binary code embedding to calculate code similarity and detect malicious code [32].…”
Section: Related Workmentioning
confidence: 99%
“…While it can be extracted efficiently, the original binary bytes produced by the same source code may vary significantly after undergoing different compilation processes, resulting in poor robustness. ACCESS2020 [40] employs a direct approach to extract the raw binary bytes and transforms them into vectors and signals for similarity analysis.…”
Section: Feature Typementioning
confidence: 99%