Real-Time Detection of Dictionary DGA Network Traffic Using Deep Learning

Highnam, Kate; Puzio, Domenic; Luo, Song; Jennings, Nicholas R.

doi:10.1007/s42979-021-00507-w

Cited by 44 publications

(13 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Researches in this regard have turned to leveraging Machine Learning techniques to detect bot activity at different stages of bot infection, i.e propagation, rallying and post-infection behavior. Highnam et al [14] targeted identification of bot malwares that used Domain-Generation-Algorithms (DGA) based domain names for finding its respective C&C. Such malware creates anomalous DNS traffic during the rallying phase. ey leveraged the deterministic nature of such algorithms and trained a deep neural network composed of LSTM, CNN and ANN in order to identify whether a paritcular host was making DNS calls for domains that were DGA generated.…”

Section: Bots and Botnetsmentioning

confidence: 99%

Transfer Learning Auto-Encoder Neural Networks for Anomaly Detection of DDoS Generating IoT Devices

Anwar

Shaheen

Gani

2022

Security and Communication Networks

View full text Add to dashboard Cite

Machine Learning based anomaly detection ap- proaches have long training and validation cycles. With IoT devices rapidly proliferating, training anomaly models on a per device basis is impractical. This work explores the “transfer- ability” of a pre-trained autoencoder model across devices of similar and different nature. We hypothesized that devices of similar nature would have similar high level feature character- istics represented by the initial layers of the autoencoder, while the more distinct features are captured by the innermost layer of the neural network. In our experiments, the centre-most layers of autoencoder models were re-trained with limited new data belonging to a different device. Datasets of seven Mirai infected and nine Bashlite infected IoT devices were used; each dataset also included benign records representing un-infected behaviour. We observed that the model’s detection accuracy improved by an average of 9.52% for Mirai and 44.59% for Bashlite. The highest performance improvement of 26.68% and 73.00% was observed when the anomaly model of Ecobee thermostat was tested on other devices before and after transfer learning for Mirai and Bashlite respectively. Additionally, transfer learning took 47.31% and 58.27% less time for Mirai and Bashlite respectively. We further trialed the efficacy of the autoencoder based anomaly model on flow based records of network traffic using the CIC- IDS2017 dataset. It was observed that the model performed best when distinct outliers in the dataset were present, whereas the model failed to perform decently in cases where the malicious activity did not cause significant deviation in network traffic’s footprint.

show abstract

Section: Bots and Botnetsmentioning

confidence: 99%

Transfer Learning Auto-Encoder Neural Networks for Anomaly Detection of DDoS Generating IoT Devices

Anwar

Shaheen

Gani

2022

Security and Communication Networks

View full text Add to dashboard Cite

show abstract

“…The system ran over a single CPU of 2.00 GHz, and it only needed 3.3% of its capacity per device to be monitored, with a false positive rate of about 0.13% . Highnam et al 22 developed the Bilbo the"bagging" model, which combined two neural networks, a CNN and a LSTM, to determine whether a URL is legitimate or generated with a DGA. In their experimentation on four hours of real traffic, Bilbo discovered five potential botnets that commercial tools did not warn about.…”

Section: State Of the Artmentioning

confidence: 99%

Real-time botnet detection on large network bandwidths using machine learning

Velasco-Mata

González-Castro

Fidalgo

et al. 2023

Sci Rep

View full text Add to dashboard Cite

Botnets are one of the most harmful cyberthreats, that can perform many types of cyberattacks and cause billionaire losses to the global economy. Nowadays, vast amounts of network traffic are generated every second, hence manual analysis is impossible. To be effective, automatic botnet detection should be done as fast as possible, but carrying this out is difficult in large bandwidths. To handle this problem, we propose an approach that is capable of carrying out an ultra-fast network analysis (i.e. on windows of one second), without a significant loss in the F1-score. We compared our model with other three literature proposals, and achieved the best performance: an F1 score of 0.926 with a processing time of 0.007 ms per sample. We also assessed the robustness of our model on saturated networks and on large bandwidths. In particular, our model is capable of working on networks with a saturation of 10% of packet loss, and we estimated the number of CPU cores needed to analyze traffic on three bandwidth sizes. Our results suggest that using commercial-grade cores of 2.4 GHz, our approach would only need four cores for bandwidths of 100 Mbps and 1 Gbps, and 19 cores on 10 Gbps networks.

show abstract

“…The following subsections provide more details on the ML models in Section 3.1, on the DL models in Section 3.2, on other methods in Section 3.3, and on the datasets used in the reviewed studies in Section 3.4. [37] RNN Alexa/DGArchive (63 DGAs), Bambenek (11 DGAs) Koh and Rhodes [38] LSTM OpenDNS/Bader, Abakumov Tran et al [39] LSTM.MI Alexa/Bambenek (37 DGAs) Vinayakumar et al [40] LSTM, GRU, IRNN, RNN, CNN, hybrid (CNN-LSTM) Alexa, OpenDNS/Bambenek, Bader (17 DGAs) Xu et al [41] CNN-based Alexa/DGArchive (16 DGAs) Yu et al [42] LSTM, BiLSTM, stacked CNN, parallel CNN, hybrid (CNN-LSTM) Alexa/Bambenek Akarsh et al [43] LSTM OpenDNS, Alexa/20 public DGAs Qiao et al [44] LSTM Alexa/Bambenek Liu et al [45] Hybrid (BiLSTM-CNN) Alexa/Netlab (50 DGAs), Bambenek (30 DGAs) Ren et al [46] CNN, LSTM, CNN-BiLSTM, ATT-CNN-BiLSTM, SVM Alexa/Bambenek, Netlab (19 DGAs) Sivaguru et al [31] hybrid (RF-LSTM.MI) Alexa, private/DGArchive Vij et al [47] LSTM Alexa/11 DGAs Cucchiarelli et al [34] BiLSTM, LSTM.MI, hybrid (CNN-BiLSTM) Alexa/Netlab (25 DGAs) Highnam et al [48] hybrid (CNN-LSTM-ANN) Alexa/DGArchive (3 DGAs) Namgung et al [49] CNN, LSTM, BiLSTM, hybrid (CNN-BiLSTM) Alexa/Bambenek Yilmaz et al [50] LSTM Majestic/DGArchive (68 DGAs) [53] 2020 Alexa/various Yan et al [54] 2020 Passive DNS data/public blacklists Yin et al [55] 2020 Alexa/Bader (19 DGAs)…”

Section: Literature Reviewmentioning

confidence: 99%

“…Highnam et al [48] use a hybrid CNN-LSTM-ANN model. The output of the embedding layer is passed to separate LSTM and CNN models in parallel.…”

Section: Hybrid Cnn-rnn Modelsmentioning

confidence: 99%

Detection of DGA-Generated Domain Names with TF-IDF

Vranken

Alizadeh

2022

Electronics

View full text Add to dashboard Cite

Botnets often apply domain name generation algorithms (DGAs) to evade detection by generating large numbers of pseudo-random domain names of which only few are registered by cybercriminals. In this paper, we address how DGA-generated domain names can be detected by means of machine learning and deep learning. We first present an extensive literature review on recent prior work in which machine learning and deep learning have been applied for detecting DGA-generated domain names. We observe that a common methodology is still missing, and the use of different datasets causes that experimental results can hardly be compared. We next propose the use of TF-IDF to measure frequencies of the most relevant n-grams in domain names, and use these as features in learning algorithms. We perform experiments with various machine-learning and deep-learning models using TF-IDF features, of which a deep MLP model yields the best results. For comparison, we also apply an LSTM model with embedding layer to convert domain names from a sequence of characters into a vector representation. The performance of our LSTM and MLP models is rather similar, achieving 0.994 and 0.995 AUC, and average F1-scores of 0.907 and 0.891 respectively.

show abstract

Real-Time Detection of Dictionary DGA Network Traffic Using Deep Learning

Cited by 44 publications

References 30 publications

Transfer Learning Auto-Encoder Neural Networks for Anomaly Detection of DDoS Generating IoT Devices

Transfer Learning Auto-Encoder Neural Networks for Anomaly Detection of DDoS Generating IoT Devices

Real-time botnet detection on large network bandwidths using machine learning

Detection of DGA-Generated Domain Names with TF-IDF

Contact Info

Product

Resources

About