Malbert: A novel pre-training method for malware detection

Xu, Zhi-Feng; Yang, Gaoming

doi:10.1016/j.cose.2021.102458

Cited by 14 publications

(3 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specifically, it uses BERT [9] based model with static analysis of Android applications to perform binary and multiclass classification. Also called MalBERT but oriented to the detection of malware affecting windows systems using BERT, MalBERT: A novel pre-training method for malware detection [31] uses dynamic analysis with two different datasets with more than 40 000 samples. Their results show 99.9% detection rate on their datasets and more than 98% under different robustness tests.…”

Section: Discussionmentioning

confidence: 99%

Android Malware Detection Using BERT

Souani

Khanfir

Bartel

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In this paper, we propose two empirical studies to (1) detect Android malware and (2) classify Android malware into families. We first (1) reproduce the results of MalBERT using BERT models learning with Android application's manifests obtained from 265k applications (vs. 22k for MalBERT) from the AndroZoo dataset in order to detect malware. The results of the MalBERT paper are excellent and hard to believe as a manifest only roughly represents an application, we therefore try to answer the following questions in this paper. Are the experiments from MalBERT reproducible? How important are Permissions for malware detection? Is it possible to keep or improve the results by reducing the size of the manifests? We then (2) investigate if BERT can be used to classify Android malware into families. The results show that BERT can successfully differentiate malware/goodware with 97% accuracy. Furthermore BERT can classify malware families with 93% accuracy. We also demonstrate that Android permissions are not what allows BERT to successfully classify and even that it does not actually need it. IntroductionAndroid malware are malicious applications aiming at attacking the end-users' devices, data, money, software or third party applications and services [5]. With the democratization of smartphones, virtually everyone nowadays carries everyday a device that can access, store, and manipulate sensitive and private data. Android, being the most used smartphone operating system, is a target of choice for attackers, who create malicious applications that aim to obtain financial gains from often unsuspecting users.In fact, new Malware are constantly being released [19], causing a constant threat and challenge for the users, the application-markets maintainers, and the security researchers.Consequently, much effort and resources are spent to develop approaches that are able to automatically detect Malware in the unstopping flow of new applications. This includes detection approaches at the app store level such as Google PlayStore [2], or at the device level via anti-viruses [5]. Practitioners and researchers are in a constant race with the load of appearing Malware, thus, trying to detect not only previously identified Malware but also new ones. For this purpose, they propose approaches that classify the applications into Malware or not depending on relevant suspiciousness-related components appearing in the applications. Those approaches are classified into two main categories: static and dynamic analysis techniques. The approaches based on static analysis aim at identifying Malware by parsing and evaluating the syntax of the application while the dynamic-based approaches extract information about application by instrumenting and running them in order to capture any eventual malicious/suspicious behavior of the application through its execution. Additionally, a third approach category -a hybrid one -consists of combining both static and dynamic analysis, in the hope of obtaining more and better information that could be leveraged...

show abstract

Section: Discussionmentioning

confidence: 99%

Android Malware Detection Using BERT

Souani

Khanfir

Bartel

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…By comparing the test API call sequence with the behavior sequence of malware samples, the malware samples were classifed and detected by the dynamic analysis method. Xu et al [39] detected malicious Windows software through the pretraining model, extracted the application programming interface (API) sequence of malware samples by combining natural language processing (NLP) with the dynamic analysis method, and then conducted experiments on two diferent datasets through the fne-tuning method.…”

Section: Malware Analysis Methods Based On Dynamic Analysismentioning

confidence: 99%

Image-Based Malware Classification Method with the AlexNet Convolutional Neural Network Model

Zhao

Yang

et al. 2023

Security and Communication Networks

View full text Add to dashboard Cite

In recent years, malware has experienced explosive growth and has become one of the most severe security threats. However, feature engineering easily restricts the traditional machine learning methods-based malware classification and is hard to deal with massive malware. At the same time, the dynamic analysis methods have the problems of complex operation and high cost, which are not suitable for efficiently classifying large quantities of malware. Therefore, we propose a novel static malware detection method based on this study’s AlexNet convolutional neural network (CNN). Unlike existing solutions, we convert all malware bytes into color images, propose an improved AlexNet architecture, and solve the unbalanced datasets with the data enhancement method. Extensive experiments are performed using the Microsoft malware dataset and the Google Code Jam (GCJ) dataset. The experimental results show that the accuracy of the Microsoft malware dataset reaches 99.99%, and the GCJ dataset reaches 99.38%. We also verify that our method can better extract the texture features of malware and improve the accuracy and detection efficiency.

show abstract

“…In recent years, with the advancement of deep learning technologies, researchers have begun employing deep neural networks for malware detection. Applications such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and large pre-trained models from natural language processing have been applied to the task of detecting malware API sequences [ 10 , 11 , 12 , 13 ]. Although researchers have achieved excellent results using API sequence features in malware detection tasks, there are still research gaps in classifying different types of malware and detecting unknown attacks.…”

Section: Introductionmentioning

confidence: 99%

Channel Features and API Frequency-Based Transformer Model for Malware Identification

Qian,

Cong

2024

Sensors

View full text Add to dashboard Cite

Malicious software (malware), in various forms and variants, continues to pose significant threats to user information security. Researchers have identified the effectiveness of utilizing API call sequences to identify malware. However, the evasion techniques employed by malware, such as obfuscation and complex API call sequences, challenge existing detection methods. This research addresses this issue by introducing CAFTrans, a novel transformer-based model for malware detection. We enhance the traditional transformer encoder with a one-dimensional channel attention module (1D-CAM) to improve the correlation between API call vector features, thereby enhancing feature embedding. A word frequency reinforcement module is also implemented to refine API features by preserving low-frequency API features. To capture subtle relationships between APIs and achieve more accurate identification of features for different types of malware, we leverage convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. Experimental results demonstrate the effectiveness of CAFTrans, achieving state-of-the-art performance on the mal-api-2019 dataset with an F1 score of 0.65252 and an AUC of 0.8913. The findings suggest that CAFTrans improves accuracy in distinguishing between various types of malware and exhibits enhanced recognition capabilities for unknown samples and adversarial attacks.

show abstract

Malbert: A novel pre-training method for malware detection

Cited by 14 publications

References 25 publications

Android Malware Detection Using BERT

Android Malware Detection Using BERT

Image-Based Malware Classification Method with the AlexNet Convolutional Neural Network Model

Channel Features and API Frequency-Based Transformer Model for Malware Identification

Contact Info

Product

Resources

About