Luhui Yang scite author profile

In highly sophisticated network attacks, command-and-control (C&C) servers always use domain generation algorithms (DGAs) to dynamically produce several candidate domains instead of static hard-coded lists of IP addresses or domain names. Distinguishing the domains generated by DGAs from the legitimate ones is critical for finding out the existence of malware or further locating the hidden attackers. The word-based DGAs disclosed in recent network attack events have shown significantly stronger stealthiness when compared with traditional character-based DGAs. In word-based DGAs, two or more words are randomly chosen from one or more specific dictionaries to form a dynamic domain, these regularly generated domains aim to mimic the characteristics of a legitimate domain. Existing DGA detection schemes, including the state-of-the-art one based on deep learning, still cannot find out these domains accurately while maintaining an acceptable false alarm rate. In this study, we exploit the inter-word and inter-domain correlations using semantic analysis approaches, word embedding and the part-of-speech are taken into consideration. Next, we propose a detection framework for word-based DGAs by incorporating the frequency distribution of the words and that of part-of-speech into the design of the feature set. Using an ensemble classifier constructed from Naive Bayes, Extra-Trees, and Logistic Regression, we benchmark the proposed scheme with malicious and legitimate domain samples extracted from public datasets. The experimental results show that the proposed scheme can achieve significantly higher detection accuracy for word-based DGAs when compared with three state-of-the-art DGA detection schemes.

show abstract

Detecting Stealthy Domain Generation Algorithms Using Heterogeneous Deep Neural Network Framework

Yang

Liu

Dai

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Distinguishing malicious domain names generated by various domain generation algorithms (DGA) is critical for defending a network against sophisticated network attacks. In recent years, stealthy domain generation algorithms (SDGA) have been proposed and revealed significantly stronger stealthiness comparing to the traditional character-based DGA. Existing state-of-the-art detection schemes are not effective enough for detecting SDGA. In this paper, we exploit the character-level characteristics of the SDGA domain names and propose a heterogeneous deep neural network framework (HDNN) for detecting SDGA. HDNN employs a proposed improved parallel CNN (IPCNN) architecture with multisizes of convolution kernel for extracting multi-scale local features from a domain name. The framework also contains a proposed self-attention based bidirectional long short term memory (SA-Bi-LSTM) architecture which can extract the bidirectional global features with attention mechanism from a domain name. Besides that, the focal loss function is introduced to mitigate the imbalance of the sample quantity in the training phase. The benchmark experiments are carried out based on the database composed of the collected benign domain names, real-world DGA and SDGA ones. Compared to the 6 influential deep-learning-based DGA detection schemes, the proposed scheme has achieved state-of-the-art detection results on SDGAs, and also achieved state-of-the-art results on binary and multiclass classification for traditional DGAs.INDEX TERMS Convolutional neural network, cyber security, domain generation algorithm, deep learning, long short term memory.

show abstract

A Novel Detection Method for Word-Based DGA

Yang

Liu

Zhai

et al. 2018

View full text Add to dashboard Cite

Refined identification of hybrid traffic in DNS tunnels based on regression analysis

et al. 2020

View full text Add to dashboard Cite

DNS (Domain Name System) tunnels almost obscure the true network activities of users, which makes it challenging for the gateway or censorship equipment to identify malicious or unpermitted network behaviors. An efficient way to address this problem is to conduct a temporal‐spatial analysis on the tunnel traffic. Nevertheless, current studies on this topic limit the DNS tunnel to those with a single protocol, whereas more than one protocol may be used simultaneously. In this paper, we concentrate on the refined identification of two protocols mixed in a DNS tunnel. A feature set is first derived from DNS query and response flows, which is incorporated with deep neural networks to construct a regression model. We benchmark the proposed method with captured DNS tunnel traffic, the experimental results show that the proposed scheme can achieve identification accuracy of more than 90%. To the best of our knowledge, the proposed scheme is the first to estimate the ratios of two mixed protocols in DNS tunnels.

show abstract

A semantic element representation model for malicious domain name detection

Yang

Liu

Wang

et al. 2022

Journal of Information Security and Applications

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Luhui Yang

Detecting Word-Based Algorithmically Generated Domains Using Semantic Analysis

Detecting Stealthy Domain Generation Algorithms Using Heterogeneous Deep Neural Network Framework

A Novel Detection Method for Word-Based DGA

Refined identification of hybrid traffic in DNS tunnels based on regression analysis

A semantic element representation model for malicious domain name detection

Contact Info

Product

Resources

About