2014
DOI: 10.7763/jacn.2014.v2.87
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of the Effect of Clustering the Training Data in Naive Bayes Classifier for Anomaly Network Intrusion Detection

Abstract: Abstract-This paper presents the analysis of the effect of clustering the training data and test data in classification efficiency of Naive Bayes classifier. KDD cup 99 benchmark dataset is used in this research. The training set is clustered using k means clustering algorithm into 5 clusters. Then 8800 samples are taken from the clusters to form the training and test set. The results are compared with that of two Naive Bayes classifiers trained on random sampled data containing 8800 and 17600 instances respec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 12 publications
0
3
0
Order By: Relevance
“…Also, they focused on establishing a relationship between the attack types and the protocol used by the hackers, using clustered data. Subramanian et al [46] presented an analysis of the effect of clustering the training data and test data in the classification efficiency of the Naive Bayes classifier. Kumar et al [47] proposed a clustering approach based on a simple k-means clustering algorithm to analyze the NSL-KDD dataset.…”
Section: Figure 1: Sdn-based Intrusion Detection System 3 Methodology...mentioning
confidence: 99%
“…Also, they focused on establishing a relationship between the attack types and the protocol used by the hackers, using clustered data. Subramanian et al [46] presented an analysis of the effect of clustering the training data and test data in the classification efficiency of the Naive Bayes classifier. Kumar et al [47] proposed a clustering approach based on a simple k-means clustering algorithm to analyze the NSL-KDD dataset.…”
Section: Figure 1: Sdn-based Intrusion Detection System 3 Methodology...mentioning
confidence: 99%
“…Mainly these IDS use different types of intrusion detection techniques. These techniques are based on: Signature [29][30][31][32][33], Anomaly [34][35][36][37][38], Artificial Neural Network (ANN) [39][40][41][42][43], Fuzzy Logic [44][45][46][47], Association Rule [34,48,49], Support Vector Machine (SVM) [50][51][52], Genetic Algorithm (GA) [53][54][55][56][57], Hybrid Technique [58]. Signaturebased IDS mainly detect intrusion by matching captured patterns with previously generated pattern databases.…”
Section: Ids Overview and Limitationsmentioning
confidence: 99%
“…Naive Bayes Classification (NBC) for handling missing data need appropriate replacement value to maintain the method performance. Missing data at multivariate if there are mixed values either discrete, continuous, and category will require the conversion process to be numerical value [12]. NBC to handle missing data can work with the condition it requires imputation process firstly to replace value part whose attribute missed so it is called Naive Bayes Imputation (NBI).…”
Section: Introductionmentioning
confidence: 99%