In recent years, cyberattacks using command and control (C&C) servers have significantly increased. To hide their C&C servers, attackers often use a domain generation algorithm (DGA), which automatically generates domain names for the C&C servers. Accordingly, extensive research on DGA domain detection has been conducted. However, existing methods cannot accurately detect continuously generated DGA domains and can easily be evaded by an attacker. Recently, long short-term memory- (LSTM-) based deep learning models have been introduced to detect DGA domains in real time using only domain names without feature extraction or additional information. In this paper, we propose an efficient DGA domain detection method based on bidirectional LSTM (BiLSTM), which learns bidirectional information as opposed to unidirectional information learned by LSTM. We further maximize the detection performance with a convolutional neural network (CNN) + BiLSTM ensemble model using Attention mechanism, which allows the model to learn both local and global information in a domain sequence. Experimental results show that existing CNN and LSTM models achieved F1-scores of 0.9384 and 0.9597, respectively, while the proposed BiLSTM and ensemble models achieved higher F1-scores of 0.9618 and 0.9666, respectively. In addition, the ensemble model achieved the best performance for most DGA domain classes, enabling more accurate DGA domain detection than existing models.
In recent years, there have been many research efforts on Big Data, and many companies developed a variety of relevant products. Accordingly, we are able to store and analyze a large volume of log data, which have been difficult to be handled in the traditional computing environment. To handle a large volume of log data, which rapidly occur in multiple servers, in this paper we design a new data storage architecture to efficiently analyze those big log data through Apache Hive. We then design and implement anomaly detection methods, which identify abnormal status of servers from log data, based on moving average and 3-sigma techniques. We also show effectiveness of the proposed detection methods by demonstrating that our methods identifies anomalies correctly. These results show that our anomaly detection is an excellent approach for properly detecting anomalies from Hadoop log data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.