Machine learning technology has become mainstream in a large number of domains, and cybersecurity applications of machine learning techniques are plenty. Examples include malware analysis, especially for zero‐day malware detection, threat analysis, anomaly based intrusion detection of prevalent attacks on critical infrastructures, and many others. Due to the ineffectiveness of signature‐based methods in detecting zero day attacks or even slight variants of known attacks, machine learning‐based detection is being used by researchers in many cybersecurity products. In this review, we discuss several areas of cybersecurity where machine learning is used as a tool. We also provide a few glimpses of adversarial attacks on machine learning algorithms to manipulate training and test data of classifiers, to render such tools ineffective.
This article is categorized under:
Application Areas > Science and Technology
Technologies > Machine Learning
Technologies > Classification
Application Areas > Data Mining Software Tools
Malwares are big threat to digital world and evolving with high complexity. It can penetrate networks, steal confidential information from computers, bring down servers and can cripple infrastructures etc. To combat the threat/attacks from the malwares, anti-malwares have been developed. The existing anti-malwares are mostly based on the assumption that the malware structure does not changes appreciably. But the recent advancement in second generation malwares can create variants and hence posed a challenge to anti-malwares developers. To combat the threat/attacks from the second generation malwares with low false alarm we present our survey on malwares and its detection techniques.
General TermsInformation security and Malware analysis
The metamorphic malware variants with the same malicious behavior (family), can obfuscate themselves to look different from each other. This variation in structure lead to a huge signature database for traditional signature matching techniques to detect them. In order to effective and efficient detection of malwares in large amounts of executables, we need to partition these files into groups which can identify their respective families. In addition, the grouping criteria should be chosen such a way that, it can also be applied to unknown files encounter on computer for classification. This paper discusses the study of malwares and benign executables in groups to detect unknown malwares with high accuracy. We studied sizes of malwares generated by three popular second generation malwares (metamorphic malwares) creator kits viz. G2, PS-MPC and NGVCK, and observed that the size variation in any two generated malwares from same kit is not much. Hence we grouped the executables on the basis of malware sizes by using Optimal k-Means Clustering algorithm and used these obtained groups to select promising features for training (Random forest, J48, LMT, FT and NBT) classifiers to detect variants of malwares or unknown malwares. We find that detection of malwares on the basis of their respected file sizes gives accuracy up to 99.11% from the classifiers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.