Parallelization of Machine Learning Applied to Call Graphs of Binaries for Malware Detection

Schleicher, Robert; Xu, Lifan; Killian, William; Vanderbruggen, Tristan; Forren, Teague; Howe, AC; Pearson, Zachary; Shannon, Corey; Simmons, Joshua A.; Cavazos, John

doi:10.1109/pdp.2017.41

Cited by 18 publications

(5 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The scale of malware detection problem is such that we have millions of samples already and thousands streaming in every day. Many classic graph mining based approaches (e.g., [48]) are NP hard and have severe scalability issues, making them impractical for malware detection in the wild [30,75].…”

Section: Challenges In ML Based Malware Detectionmentioning

confidence: 99%

A multi-view context-aware approach to Android malware detection and malicious code localization

et al. 2017

View full text Add to dashboard Cite

Existing Android malware detection approaches use a variety of features such as securitysensitive APIs, system calls, control-flow structures and information flows in conjunction with Machine Learning classifiers to achieve accurate detection. Each of these feature sets provides a unique semantic perspective (or view ) of apps' behaviors with inherent strengths and limitations. Meaning, some views are more amenable to detect certain attacks but may not be suitable to characterize several other attacks. Most of the existing malware detection approaches use only one (or a selected few) of the aforementioned feature sets which prevents them from detecting a vast majority of attacks. Addressing this limitation, we propose MKLDroid, a unified framework that systematically integrates multiple views of apps for performing comprehensive malware detection and malicious code localization. The rationale is that, while a malware app can disguise itself in some views, disguising in every view while maintaining malicious intent will be much harder.MKLDroid uses a graph kernel to capture structural and contextual information from apps' dependency graphs and identify malice code patterns in each view. Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted combination of the views which yields the best detection accuracy. Besides multi-view learning, MKLDroid's unique and salient trait is its ability to locate fine-grained malice code portions in dependency graphs (e.g., methods/classes). Malicious code localization caters several important applications such as supporting human analysts studying malware behaviors, engineering malware signatures, and other counter-measures. Through our large-scale experiments on several datasets (incl. wild apps), we demonstrate that MKLDroid outperforms three state-of-the-art techniques consistently, in terms of accuracy while maintaining comparable efficiency. In our malicious code localization experiments on a dataset of repackaged malware, MKLDroid was able to identify all the malice classes with 94% average recall. Our work opens up two new avenues in malware research: (i) enables the research community to elegantly look at Android malware behaviors in multiple perspectives simultaneously, and (ii) performing precise and scalable malicious code localization.

show abstract

Section: Challenges In ML Based Malware Detectionmentioning

confidence: 99%

A multi-view context-aware approach to Android malware detection and malicious code localization

et al. 2017

View full text Add to dashboard Cite

show abstract

“…We achieved a prediction accuracy of 99.41% in the malware family classification task. Our approach outperforms other malware classifiers that involve extensive feature engineering or extract significantly more data from the executable such as non-code data [8,9,20]. Since we only use the code sections of the executable, we expect that incorporating additional data such as the .rsrc and .idata sections would help to further improve classification results.…”

Section: Resultsmentioning

confidence: 98%

“…In prior works, call graphs have been used to automatically classify malware but typically these works employ relatively simple graph similarity measures such as graph edit distance or rely on heavy feature engineering involving summary statistics to describe functions in the graph [9,20,6,5]. We build on this call graph approach by incorporating certain representation learning techniques such as autoencoding and clustering [7] to obtain an improved function representation.…”

Section: Related Workmentioning

confidence: 99%

Classifying Malware Using Function Representations in a Static Call Graph

Dalton¹,

Schmidtler²,

Khodabakhshi³

2020

Preprint

View full text Add to dashboard Cite

We propose a deep learning approach for identifying malware families using the function call graphs of x86 assembly instructions. Though prior work on static call graph analysis exists, very little involves the application of modern, principled feature learning techniques to the problem. In this paper, we introduce a system utilizing an executable's function call graph where function representations are obtained by way of a recurrent neural network (RNN) autoencoder which maps sequences of x86 instructions into dense, latent vectors. These function embeddings are then modeled as vertices in a graph with edges indicating call dependencies. Capturing rich, node-level representations as well as global, topological properties of an executable file greatly improves malware family detection rates and contributes to a more principled approach to the problem in a way that deliberately avoids tedious feature engineering and domain expertise. We test our approach by performing several experiments on a Microsoft malware classification data set and achieve excellent separation between malware families with a classification accuracy of 99.41%.

show abstract

“…Elect.Crime Investigation 8(1):IJECI MS.ID-02 (2024) Ali [67] Gavrilut [69] Ghafir [100] Ghiasi [87] Huda [89] Huda [116] Huda [78] Ki [118] Kim [77] Kolosnjaji [41] Le [97] Liu [19] Mangialardo [109] The statistical examination of ML revealing techniques is covered in this part of the paper. Raff [42] Searles [117] Shabtai [72] Shijo [110] Srndic [76] Stiborek [99] Veeramani [26] Wagner [95] Wang [80] Mao [119] Markel [43] Mohaisen [81] Nagano and Uda [79] Narra [3] Nauman [113] Nayaranan [92] Okane [120] Pan [88] Pfeffer [115] Pirscoveanu [65] revealing techniques. The greatest accuracy of 99 % was obtained by [117] the authors in [77], and [27] using the SVM classification methods.…”

Section: Hybrid Malware Detectionmentioning

confidence: 99%

“…Raff [42] Searles [117] Shabtai [72] Shijo [110] Srndic [76] Stiborek [99] Veeramani [26] Wagner [95] Wang [80] Mao [119] Markel [43] Mohaisen [81] Nagano and Uda [79] Narra [3] Nauman [113] Nayaranan [92] Okane [120] Pan [88] Pfeffer [115] Pirscoveanu [65] revealing techniques. The greatest accuracy of 99 % was obtained by [117] the authors in [77], and [27] using the SVM classification methods. Fig.…”

Section: Hybrid Malware Detectionmentioning

confidence: 99%

A Comprehensive Study for Malware Detection through Machine Learning in Executable Files

Ahmad

2024

IJECI

View full text Add to dashboard Cite

Two methods are frequently used to analyze malware and start specimens: static analysis and dynamic analysis. Following analysis, distinct characteristics are retrieved to distinguish malware from benign samples. The detection capacity of malware is contingent upon the effectiveness with which discriminative malware characteristics are retrieved through analysis methods. While conventional approaches and techniques were used inadvertently, machine learning algorithms are now utilized to classify malware, which can deal with the complexity and velocity of malware creation. However, even though a few research papers have been published, recent classifications of signature, behavioral and hybrid machine learning is not introduced well. Based on this demand, we provide a comprehensive analysis of malware detection using machine learning, as well as address the different difficulties associated with building the malware classifier. Finally, future work is addressed to build an effective malware detection system by addressing different malware detection problems.

show abstract

Parallelization of Machine Learning Applied to Call Graphs of Binaries for Malware Detection

Cited by 18 publications

References 10 publications

A multi-view context-aware approach to Android malware detection and malicious code localization

A multi-view context-aware approach to Android malware detection and malicious code localization

Classifying Malware Using Function Representations in a Static Call Graph

A Comprehensive Study for Malware Detection through Machine Learning in Executable Files

Contact Info

Product

Resources

About