“…The information extraction process is performed through either static or dy- [42,56,68,15,16,60,57,58,69,51,52,33] or emulators [70,40]. Also program analysis tools and techniques can be useful in the feature extraction process by providing, for example, disassembly code and control-and data-flow graphs.…”
Coping with malware is getting more and more challenging, given their relentless growth in complexity and volume. One of the most common approaches in literature is using machine learning techniques, to automatically learn models and patterns behind such complexity, and to develop technologies to keep pace with malware evolution. This survey aims at providing an overview on the way machine learning has been used so far in the context of malware analysis in Windows environments, i.e. for the analysis of Portable Executables. We systematize surveyed papers according to their objectives (i.e., the expected output), what information about malware they specifically use (i.e., the features), and what machine learning techniques they employ (i.e., what algorithm is used to process the input and produce the output). We also outline a number of issues and challenges, including those concerning the used datasets, and identify the main current topical trends and how to possibly advance them. In particular, we introduce the novel concept of malware analysis economics, regarding the study of existing trade-offs among key metrics, such as analysis accuracy and economical costs.
“…The information extraction process is performed through either static or dy- [42,56,68,15,16,60,57,58,69,51,52,33] or emulators [70,40]. Also program analysis tools and techniques can be useful in the feature extraction process by providing, for example, disassembly code and control-and data-flow graphs.…”
Coping with malware is getting more and more challenging, given their relentless growth in complexity and volume. One of the most common approaches in literature is using machine learning techniques, to automatically learn models and patterns behind such complexity, and to develop technologies to keep pace with malware evolution. This survey aims at providing an overview on the way machine learning has been used so far in the context of malware analysis in Windows environments, i.e. for the analysis of Portable Executables. We systematize surveyed papers according to their objectives (i.e., the expected output), what information about malware they specifically use (i.e., the features), and what machine learning techniques they employ (i.e., what algorithm is used to process the input and produce the output). We also outline a number of issues and challenges, including those concerning the used datasets, and identify the main current topical trends and how to possibly advance them. In particular, we introduce the novel concept of malware analysis economics, regarding the study of existing trade-offs among key metrics, such as analysis accuracy and economical costs.
“…Different malware detection approaches in the literature have adopted different machine-learning techniques, such as random forest (RF) [5][6][7], neural network [9][10][11], decision tree [12,13], naïve Bayes [14,15], KNN and SVM [15], ARIMA [16], and reinforcement learning [17,18].…”
Section: Related Workmentioning
confidence: 99%
“…This work extends a previously explored approach called RBCM, which is also based on reinforcement learning [3]. The RBCM extension is called eRBCM, and merges the most beneficial features of Monte-Carlo-based real-time learning (MOCART) [4] and random forest [5][6][7] to make it more scalable for higher-order training datasets.…”
Section: Introductionmentioning
confidence: 99%
“…Most of the new intelligent approaches to malware detection are trained using the selective features of known malware that can represent malware in its best form. These representations are then used as training instances for a suitable machine-learning algorithm that generalizes or maps such features-based malware detection mechanisms [3][4][5][6][7][8][9][10][11][12][13]. This work extends a previously explored approach called RBCM, which is also based on reinforcement learning [3].…”
The use of innovative and sophisticated malware definitions poses a serious threat to computer-based information systems. Such malware is adaptive to the existing security solutions and often works without detection. Once malware completes its malicious activity, it self-destructs and leaves no obvious signature for detection and forensic purposes. The detection of such sophisticated malware is very challenging and a non-trivial task because of the malware’s new patterns of exploiting vulnerabilities. Any security solutions require an equal level of sophistication to counter such attacks. In this paper, a novel reinforcement model based on Monte-Carlo simulation called eRBCM is explored to develop a security solution that can detect new and sophisticated network malware definitions. The new model is trained on several kinds of malware and can generalize the malware detection functionality. The model is evaluated using a benchmark set of malware. The results prove that eRBCM can identify a variety of malware with immense accuracy.
The huge amounts of data and information that need to be analyzed for possible malicious intent are one of the big and significant challenges that the Web faces today. Malicious software, also referred to as malware developed by attackers, is polymorphic and metamorphic in nature which can modify the code as it spreads. In addition, the diversity and volume of their variants severely undermine the effectiveness of traditional defenses that typically use signature-based techniques and are unable to detect malicious executables previously unknown. Malware family variants share typical patterns of behavior that indicate their origin and purpose. The behavioral trends observed either statically or dynamically can be manipulated by using machine learning techniques to identify and classify unknown malware into their established families. This survey paper gives an overview of the malware detection and analysis techniques and tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.