Today's signature-based anti-viruses are very accurate, but are limited in detecting new malicious code. Currently, dozens of new malicious codes are created every day, and this number is expected to increase in the coming years. Recently, classification algorithms were used successfully for the detection of unknown malicious code. These studies used a test collection with a limited size where the same malicious-benign-file ratio in both the training and test sets, which does not reflect reallife conditions. In this paper we present a methodology for the detection of unknown malicious code, based on text categorization concepts. We performed an extensive evaluation using a test collection that contains more than 30,000 malicious and benign files, in which we investigated the imbalance problem. In real-life scenarios, the malicious file content is expected to be low, about 10% of the total files. For practical purposes, it is unclear as to what the corresponding percentage in the training set should be. Our results indicate that greater than 95% accuracy can be achieved through the use of a training set that contains below 20% malicious file content.
Abstract. The recent growth in network usage has motivated the creation of new malicious code for various purposes. Today's signature-based antiviruses are very accurate for known malicious code, but can not detect new malicious code. Recently, classification algorithms were used successfully for the detection of unknown malicious code. But, these studies involved a test collection with a limited size and the same malicious: benign file ratio in both the training and test sets, a situation which does not reflect real-life conditions. We present a methodology for the detection of unknown malicious code, which examines concepts from text categorization, based on n-grams extraction from the binary code and feature selection. We performed an extensive evaluation, consisting of a test collection of more than 30,000 files, in which we investigated the class imbalance problem. In real-life scenarios, the malicious file content is expected to be low, about 10% of the total files. For practical purposes, it is unclear as to what the corresponding percentage in the training set should be. Our results indicate that greater than 95% accuracy can be achieved through the use of a training set that has a malicious file content of less than 33.3%.
Abstract. Detecting unknown worms is a challenging task. Extant solutions, such as anti-virus tools, rely mainly on prior explicit knowledge of specific worm signatures. As a result, after the appearance of a new worm on the Web there is a significant delay until an update carrying the worm's signature is distributed to anti-virus tools. We propose an innovative technique for detecting the presence of an unknown worm, based on the computer operating system measurements. We monitored 323 computer features and reduced them to 20 features through feature selection. Support vector machines were applied using 3 kernel functions. In addition we used active learning as a selective sampling method to increase the performance of the classifier, exceeding above 90% mean accuracy, and for specific unknown worms 94% accuracy.
Abstract-Detecting unknown malicious code (malcode) is a challenging task. Current common solutions, such as anti-virus tools, rely heavily on prior explicit knowledge of specific instances of malcode binary code signatures. During the time between its appearance and an update being sent to anti-virus tools, a new worm can infect many computers and cause significant damage. We present a new host-based intrusion detection approach, based on analyzing the behavior of the computer to detect the presence of unknown malicious code. The new approach consists on classification algorithms that learn from previous known malcode samples which enable the detection of an unknown malcode. We performed several experiments to evaluate our approach, focusing on computer worms being activated on several computer configurations while running several programs in order to simulate background activity. We collected 323 features in order to measure the computer behavior. Four classification algorithms were applied on several feature subsets. The average detection accuracy that we achieved was above 90% and for specific unknown worms even above 99%.
Detecting computer worms is a highly challenging task. We present a new approach that uses artificial neural networks (ANN) to detect the presence of computer worms based on measurements of computer behavior. We compare ANN to three other classification methods and show the advantages of ANN for detection of known worms. We then proceed to evaluate ANN's ability to detect the presence of an unknown worm. As the measurement of a large number of system features may require significant computational resources, we evaluate three feature selection techniques. We show that, using only five features, one can detect an unknown worm with an average accuracy of 90%. We use a causal index analysis of our trained ANN to identify rules that explain the relationships between the selected features and the identity of each worm. Finally, we discuss the possible application of our approach to host-based intrusion detection systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.