Implementing a Deep Learning Model for Intrusion Detection on Apache Spark Platform

Haggag, M. Y.; Tantawy, Mohsen M.; El-Soudani, Magdy M. S.

doi:10.1109/access.2020.3019931

Cited by 46 publications

(23 citation statements)

References 27 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…So it is an overall accuracy. The algorithms used on the NSL-KDD dataset by other models include MLP [6], DNN [7], CNN [9], Deep-MLP [37], STL-IDS [38] and DIS-IDS [39]. Figure 14 shows that the accuracy of the 5-classification of most models is approximately 80% on the KDD test set.…”

Section: The Results Of Multi-target Anomaly Classificationmentioning

confidence: 99%

A Hybrid Intrusion Detection System Based on Scalable K-Means+ Random Forest and Deep Learning

Liu

Wang

2021

IEEE Access

View full text Add to dashboard Cite

Digital assets have come under various network security threats in the digital age. As a kind of security equipment to protect digital assets, intrusion detection system (IDS) is less efficient if the alert is not timely and IDS is useless if the accuracy cannot meet the requirements. Therefore, an intrusion detection model that combines machine learning with deep learning is proposed in this paper. The model uses the kmeans and the random forest (RF) algorithms for the binary classification, and distributed computing of these algorithms is implemented on the Spark platform to quickly classify normal events and attack events. Then, by using the convolutional neural network (CNN), long short-term memory (LSTM), and other deep learning algorithms, the events judged as abnormal are further classified into different attack types finally. At this stage, adaptive synthetic sampling (ADASYN) is adopted to solve the unbalanced dataset. The NSL-KDD and CIS-IDS2017 datasets are used to evaluate the performance of the proposed model. The experimental results show that the proposed model has better TPR for most of attack events, faster data preprocessing speed, and potentially less training time. In particular, the accuracy of multi-target classification can reach as high as 85.24% in the NSL-KDD dataset and 99.91% in the CIC-IDS2017 dataset.INDEX TERMS Intrusion detection system, machine learning algorithm, k-means, random forest, deep learning algorithm.

show abstract

Section: The Results Of Multi-target Anomaly Classificationmentioning

confidence: 99%

A Hybrid Intrusion Detection System Based on Scalable K-Means+ Random Forest and Deep Learning

Liu

Wang

2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Choosing the right metric is essential during the models' evaluation because different metrics are proposed to evaluate different problems and application models [51]. Several measurements are appropriate for a classification model, but the most commonly applied one is the confusion matrix [52], [53]. A confusion matrix is a statistical measurement used in machine learning classification algorithms performance for finding the accuracy of the model.…”

Section: Evaluation Performance Appropriate Metricsmentioning

confidence: 99%

Evaluation of Classification Algorithms for Intrusion Detection System: A Review

Salih

Abdulazeez

2021

JSCDM

View full text Add to dashboard Cite

Intrusion detection is one of the most critical network security problems in the technology world. Machine learning techniques are being implemented to improve the Intrusion Detection System (IDS). In order to enhance the performance of IDS, different classification algorithms are applied to detect various types of attacks. Choosing a suitable classification algorithm for building IDS is not an easy task. The best method is to test the performance of the different classification algorithms. This paper aims to present the result of evaluating different classification algorithms to build an IDS model in terms of confusion matrix, accuracy, recall, precision, f-score, specificity and sensitivity. Nevertheless, most researchers have focused on the confusion matrix and accuracy metric as measurements of classification performance. It also provides a detailed comparison with the dataset, data preprocessing, number of features selected, feature selection technique, classification algorithms, and evaluation performance of algorithms described in the intrusion detection system.

show abstract

“…It provides a series of high-level components, including Spark streaming for real-time computing, Spark SQL for structured data processing, GraphX for graph computing, and MLlib for machine learning [ 3 ]. These components are applied by application developers to various fields, such as feature extraction [ 4 ], intrusion detection [ 5 ], and community discovery [ 6 ], and maintain good performance.…”

Section: Introductionmentioning

confidence: 99%

A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization

Huang

Zhang

Zhai

2022

Sensors

View full text Add to dashboard Cite

Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the large number of parameters and the inherent correlation between them, manual tuning is very tedious. To solve the problem of tuning through personal experience, we designed and implemented a reinforcement-learning-based Spark configuration parameter optimizer. First, we trained a Spark application performance prediction model with deep neural networks, and verified the accuracy and effectiveness of the model from multiple perspectives. Second, in order to improve the search efficiency of better configuration parameters, we improved the Q-learning algorithm, and automatically set start and end states in each iteration of training, which effectively improves the agent’s poor performance in exploring better configuration parameters. Lastly, comparing our proposed configuration with the default configuration as the baseline, experimental results show that the optimized configuration gained an average performance improvement of 47%, 43%, 31%, and 45% for four different types of Spark applications, which indicates that our Spark configuration parameter optimizer could efficiently find the better configuration parameters and improve the performance of various Spark applications.

show abstract

Implementing a Deep Learning Model for Intrusion Detection on Apache Spark Platform

Cited by 46 publications

References 27 publications

A Hybrid Intrusion Detection System Based on Scalable K-Means+ Random Forest and Deep Learning

A Hybrid Intrusion Detection System Based on Scalable K-Means+ Random Forest and Deep Learning

Evaluation of Classification Algorithms for Intrusion Detection System: A Review

A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization

Contact Info

Product

Resources

About