The ability to detect metamorphic malware has generated significant research interest over recent years, particularly given its proliferation on mobile devices. Such malware is particularly hard to detect via signature-based intrusion detection systems due to its ability to change its code over time. This article describes a novel framework which generates sets of potential mutants and then uses them as training data to inform the development of improved detection methods (either in two separate phases or in an adversarial learning setting). We outline a method to implement the mutant generation step using an evolutionary algorithm, providing preliminary results that show that the concept is viable as the first steps towards instantiation of the full framework.
In this paper, the effect of feature selection in malware detection using machine learning techniques is studied. We employ supervised and unsupervised machine learning algorithms with and without feature selection. These include both classification and clustering algorithms. The algorithms are compared for effectiveness and efficiency using their predictive accuracy, among others, as performance metric. From the studies, we observe that the best detection rate was attained for supervised learning with feature selection. The supervised learning algorithm used was Multilayer Perceptron (MLP) algorithm. The analysis also reveals that our system can detect viruses from varying sources.
CCS Concepts• Computing methodologies➝Machine learning; Feature selection • Security and privacy➝Malware and its mitigation.
In the field of metamorphic malware detection, training a detection model with malware samples that reflect potential mutants of the malware is crucial in developing a model resistant to future attacks. In this paper, we use a Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) algorithm to generate a large set of novel, malicious mutants that are diverse with respect to their behavioural and structural similarity to the original mutant. Using two classes of malware as a test-bed, we show that the MAP-Elites algorithm produces a large and diverse set of mutants, that evade between 64% to 72% of the 63 detection engines tested. When compared to results obtained using repeated runs of an Evolutionary Algorithm that converges to a single solution result, the MAP-Elites approach is shown to produce a significantly more diverse range of solutions, while providing equal or improved results in terms of evasiveness, depending on the dataset in question. In addition, the archive produced by MAP-Elites sheds insight into the properties of a sample that lead to them being undetectable by a suite of existing detection engines.
Internet of Things (IoT) is fast growing. Non-personal computer devices under the umbrella of IoT have been increasingly applied in various fields and will soon account for a significant share of total Internet traffic. However, the security and privacy of IoT and its devices have been challenged by malware, particularly polymorphic worms that rapidly selfpropagate once being launched and vary their appearance over each infection to escape from the detection of signature-based intrusion detection systems. It is well recognized that polymorphic worms are one of the most intrusive threats to IoT security.To build an effective, strong defense for IoT networks against polymorphic worms, this study proposes a machine intelligent system, termed Gram-Restricted Boltzmann Machine (Gram-RBM), which automatically generates generic fingerprints/signatures for the polymorphic worm. Two augmented N-gram-based methods are designed and applied in the derivation of polymorphic worm sequences, also known as fingerprints/signatures. These derived sequences are then optimized using the Gaussian-Bernoulli RBM dimension-reduction algorithm. The results, gained
Detecting metamorphic malware provides a challenge to machine-learning models as trained models might not generalise to future mutant variants of the malware. To address this, we explore whether machine-learning models can be improved by augmenting training data-sets with samples of potential variants. These variants are generated using an evolutionary algorithm that evolves a behaviourally diverse set of mutants, optimised to avoid detection by a large set of existing detection-engines. Using features calculated from the behavioural trace of a sample as input, we evaluate the ability of five machinelearning methods to detect the new variants, show that the detection rate is considerably improved by including the new samples as training data, and that the classifiers still generalise over a range of malware. We then repeat this experiment using a sequence-based deep-learning method as the classifier, which is shown to out-perform the feature-based classifiers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.