An intrusion detection system (IDS) is an important protection instrument for detecting complex network attacks. Various machine learning (ML) or deep learning (DL) algorithms have been proposed for implementing anomaly-based IDS (AIDS). Our review of the AIDS literature identifies some issues in related work, including the randomness of the selected algorithms, parameters, and testing criteria, the application of old datasets, or shallow analyses and validation of the results. This paper comprehensively reviews previous studies on AIDS by using a set of criteria with different datasets and types of attacks to set benchmarking outcomes that can reveal the suitable AIDS algorithms, parameters, and testing criteria. Specifically, this paper applies 10 popular supervised and unsupervised ML algorithms for identifying effective and efficient ML-AIDS of networks and computers. These supervised ML algorithms include the artificial neural network (ANN), decision tree (DT), k-nearest neighbor (k-NN), naive Bayes (NB), random forest (RF), support vector machine (SVM), and convolutional neural network (CNN) algorithms, whereas the unsupervised ML algorithms include the expectation-maximization (EM), k-means, and self-organizing maps (SOM) algorithms. Several models of these algorithms are introduced, and the turning and training parameters of each algorithm are examined to achieve an optimal classifier evaluation. Unlike previous studies, this study evaluates the performance of AIDS by measuring the true positive and negative rates, accuracy, precision, recall, and F-Score of 31 ML-AIDS models. The training and testing time for ML-AIDS models are also considered in measuring their performance efficiency given that time complexity is an important factor in AIDSs. The ML-AIDS models are tested by using a recent and highly unbalanced multiclass CICIDS2017 dataset that involves real-world network attacks. In general, the k-NN-AIDS, DT-AIDS, and NB-AIDS models obtain the best results and show a greater capability in detecting web attacks compared with other models that demonstrate irregular and inferior results.INDEX TERMS Cyberattacks, intrusion detection system, machine learning, supervised and unsupervised learning.
YouTube has become a popular social media among the users. Due to YouTube popularity, it became a platform for spammer to distribute spam through the comments on YouTube. This has become a concern because spam can lead to phishing attack which the target can be any user that click any malicious link. Spam has its own features that can be analyzed and detected by classification. Hence, enhancement features are proposed to detect YouTube spam. In order to conduct the experiments, a YouTube Spam detection framework that consists of five (5) phases such as data collection, pre-processing, features selection and extraction, classification and detection were developed. This paper, proposed the YouTube detection framework, examined and validate each of the phases by using two types of data mining tool. The features are constructed from analysis by using data collected from YouTube Spam dataset by using Naïve Bayes and Logistic Regression and tested in two different data mining tools which is Weka and Rapid Miner. From the analysis, thirteen (13) features that had been tested on Weka and RapidMiner shows high accuracy, hence is being used throughout the experiment in this research. Result of Naïve Bayes and Logistic Regression run in Weka is slightly higher than RapidMiner. In addition, result of Naïve Bayes is higher than Logistic Regression with 87.21% and 85.29% respectively in Weka. While in RapidMiner there is slightly different of accuracy between Naïve Bayes and Logistic Regression 80.41% and 80.88%. But, precision of Naïve Bayes is higher than Logistic Regression.
Intrusion detection systems (IDS) are used in analyzing huge data and diagnose anomaly traffic such as DDoS attack; thus, an efficient traffic classification method is necessary for the IDS. The IDS models attempt to decrease false alarm and increase true alarm rates in order to improve the performance accuracy of the system. To resolve this concern, three machine learning algorithms have been tested and evaluated in this research which are decision jungle (DJ), random forest (RF) and support vector machine (SVM). The main objective is to propose a ML-based network intrusion detection system (ML-based NIDS) model that compares the performance of the three algorithms based on their accuracy and precision of anomaly traffics. The knowledge discovery in databases (KDD) methodology and intrusion detection evaluation dataset (CIC-IDS2017) are used in the testing which both are considered as a benchmark in the evaluation of IDS. The average accuracy results of the SVM is 98.18%, RF is 96.76% and DJ is 96.50% in which the highest accuracy is achieved by the SVM. The average precision results of the SVM is 98.74, RF is 97.96 and DJ is 97.82 in which the SVM got a higher average precision compared with the other two algorithms. The average recall results of the SVM is 95.63, RF is 97.62 and DJ is 95.77 in which the RF achieves the highest average of recall than SVM and DJ. In overall, the SVM algorithm is found to be the best algorithm that can be used to detect an intrusion in the system.
<p>Nowadays, social media (e.g., YouTube and Facebook) provides connection and interaction between people by posting comments or videos. In fact, comments are a part of contents in a website that can attract spammer to spreading phishing, malware or advertising. Due to existing malicious users that can spread malware or phishing in the comments, this work proposes a technique used for video sharing spam comments feature detection. The first phase of the methodology used in this work is dataset collection. For this experiment, a dataset from UCI Machine Learning repository is used. In the next phase, the development of framework and experimentation. The dataset will be pre-processed using tokenization and lemmatization process. After that, the features to detect spam is selected and the experiments for classification were performed by using six classifiers which are Random Tree, Random Forest, Naïve Bayes, KStar, Decision Table, and Decision Stump. The result shows the highest accuracy is 90.57% and the lowest was 58.86%.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.