Abstract. PCA (Principal Component Analysis) is one of the most wildly used dimension reduction technique, which is often applied to identify patterns in complex data of high dimension [1]. In GA-KM [2], we have proposed GA-KM algorithm and have experimented using KDD-99 data set. The result showed GA-KM is efficient for intrusion detection. However, due to the hugeness of the data set, the experiment needs to take a long time to finish. To solve this deficiency, we combine PCA and GA-KM in this paper. The goal of PCA is to remove unimportant information like the noise in data sets which have high dimension, and retain the variation present in the original dataset as much as possible. The experimental results show that, compared to GA-KM [2], the proposed method is better in computational expense and time (through dimension reduction) and is also better in intrusion detection ratios (through noise reduction).Keywords: Intrusion detection, Principle Component Analysis (PCA), effective noise reduction, GA-KM.
IntroductionWith rapid growth of network-based services, network security is becoming more and more important than ever before. Therefore, intrusion detection system (IDS) [3] plays a vital role in network security. There are two main categories of intrusion detection techniques: signature-based detection and anomaly-based detection. Signaturebased detection is also called misuse detection which is based on signatures for known attacks. Anomaly-based detection is different from signature-based detection, which is able to detect unknown attacks by learning the behavior of normal activity.In the training phase of this approach, IDS builds a profile which represents normal behavior. In the detection phase, the similarity of a new behavior with the profile is analyzed by IDS. If the new behavior is far from normal behavior of the profile, then this behavior will be labeled as an attack. We have proposed GA-KM algorithm [2], and have experimented with this algorithm using KDD--99 data set, the results show that it is efficient for anomaly-based intrusion detection [4]. However, when we experiment with this algorithm, it takes a long time to finish. In this paper, to solve this deficiency, Principal Component Analysis is combined with GA-KM algorithm and is experimented on KDD-99 data set. PCA (Principal
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.