Abstract:Automatic identification of function clones on crossplatform aims at determining whether two functions are identical or not without access to the source code, which is a fundamental challenge in vulnerability search, code plagiarism detection, and malware classification. With the rapid development of deep neural network in pro-K E Y W O R D S attention mechanism, binary similarity, graph neural network, program analysis
| INTRODUCTIONBinary code similarity detection is a key foundation in maintaining software … Show more
Detecting if two functions in different compiled forms are similar has a wide range of applications in software security. We present a method that leverages both semantic and structural features of functions, learned by a neural-net model on the underlying control-flow graphs (CFGs). In particular, we devise a neural function-similarity regressor (NFSR) with attentions on dual CFGs. We train and evaluate NFSR on a dataset consisting of nearly 4 million functions from over 14 900 binary files. Experiments show that NFSR is superior to the SOTA models of SAFE, Gemini and GMN, especially for binary functions with large CFGs. An ablation study shows that attention on dual CFGs plays a significant role in detecting function similarities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.