Machine learning (ML) based botnet detectors are no exception to traditional ML models when it comes to adversarial evasion attacks. The datasets used to train these models have also scarcity and imbalance issues. We propose a new technique named Botshot, based on generative adversarial networks (GANs) for addressing these issues and proactively making botnet detectors aware of adversarial evasions. Botshot is cost-effective as compared to the network emulation for botnet traffic data generation rendering the dedicated hardware resources unnecessary. First, we use the extended set of network flow and time-based features for three publicly available botnet datasets. Second, we utilize two GANs (vanilla, conditional) for generating realistic botnet traffic. We evaluate the generator performance using classifier two-sample test (C2ST) with 10-fold 70-30 train-test split and propose the use of 'recall' in contrast to 'accuracy' for proactively learning adversarial evasions. We then augment the train set with the generated data and test using the unchanged test set. Last, we compare our results with benchmark oversampling methods with augmentation of additional botnet traffic data in terms of average accuracy, precision, recall and F1 score over six different ML classifiers. The empirical results demonstrate the effectiveness of the GAN-based oversampling for learning in advance the adversarial evasion attacks on botnet detectors.
Many recent literary works have leveraged generative adversarial networks (GANs) to spawn unseen evasion samples. The purpose is to annex the generated data with the original train set for adversarial training to improve the detection performance of machine learning (ML) classifiers. The quality of generating adversarial samples relies on the adequacy of training data samples. However, in low data regimes like medical anomaly detection, drug discovery and cybersecurity, the attack samples are scarce in number. This paper proposes a novel GAN design called Evasion Generative Adversarial Network (EVAGAN) that is more suitable for low data regime problems that use oversampling for detection improvement of ML classifiers. EVAGAN not only can generate evasion samples, but its discriminator can act as an evasion aware classifier. We have considered Auxiliary Classifier GAN (ACGAN) as a benchmark to evaluate the performance of EVAGAN on cybersecurity (ISCX-2014, CIC-2017 and CIC2018) botnet and CV (MNIST) datasets. We demonstrate that EVAGAN outperforms ACGAN for unbalanced datasets with respect to detection performance, training stability, time complexity. EVAGAN's generator quickly learns to generate the low sample class and hardens its discriminator simultaneously. In contrast to ML classifiers that require security hardening after being adversarially trained by GAN generated data, EVAGAN renders it needless. The experimental analysis proves EVAGAN to be an efficient evasion hardened model for low data regimes in cybersecurity and CV. Code will be available at https://github.com/rhr407/EVAGAN.Impact Statement-The applications of Artificial Intelligence (AI) can help improve the quality of human life. The use of AI is not only limited to medical anomaly detection and drug discovery but can be leveraged in computer networks to keep people safe from malicious activities on the Internet. However, the AI-based models can be biased towards the majority class of data on which they are trained due to data imbalance. Anomaly data samples are always scarce as compared to the normal data samples. So this is an open research problem to solve. Our work is an effort to improve the AI-based methods in detection performance, time complexity and stability. Using the proposed technique, we can train our AI model using fewer anomaly samples efficiently and reduce the time complexity compared to the state-of-the-art in anomaly detection.
With advanced 5G/6G networks, data-driven in-1 terconnected devices will increase exponentially. As a result, 2 the Industrial Internet of Things (IIoT) requires data secure 3 information extraction to apply digital services, medical diagnoses 4 and financial forecasting. This introduction of high-speed network 5 mobile applications will also adapt. As a consequence, the scale 6 and complexity of Android malware are rising. Detection of 7 malware classification vulnerable to attacks. A fabricate feature 8 can force misclassification to produce the desired output. This 9 study proposes a subset feature selection method to evade 10 fabricated attacks in the IIOT environment. The method extracts 11 application-aware features from a single android application 12 to train an independent classification model. Ensemble-based 13 learning is then used to train the distinct classification models. 14 Finally, the collaborative ML classifier makes independent deci-15 sions to fight against adversarial evasion attacks. We compare and 16 evaluate the benchmark Android malware dataset. The proposed 17 method achieved 91% accuracy with 14 fabricated input features.
Machine learning (ML) classifiers have been increasingly used in Android malware detection and countermeasures for the past decade. However, ML-based solutions are vulnerable to adversarial evasion attacks. An attacker can craft a malicious sample carefully to fool an underlying pre-trained classifier. In this paper, we highlight the fragility of the ML classifiers against adversarial evasion attacks. We perform mimicry attacks based on Oracle and Generative Adversarial Network (GAN) against these classifiers using our proposed methodology. We use static analysis on Android applications to extract API-based features from a balanced excerpt of a well-known public dataset. The empirical results demonstrate that among ML classifiers, the detection capability of linear classifiers can be reduced as low as 0% by perturbing only up to 4 out of 315 extracted API features. As a countermeasure, we propose TrickDroid, a cumulative adversarial training scheme based on Oracle and GAN-based adversarial data to improve evasion detection. The experimental results of cumulative adversarial training achieves a remarkable detection accuracy of up to 99.46% against adversarial samples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.