Security threats and economic loss caused by network attacks, intrusions, and vulnerabilities have motivated intensive studies on network security. Normally, data collected in a network system can reflect or can be used to detect security threats. We define these data as network security-related data. Studying and analyzing security-related data can help detect network attacks and intrusions, thus making it possible to further measure the security level of the whole network system. Obviously, the first step in detecting network attacks and intrusions is to collect security-related data. However, in the context of big data and 5G, there exist a number of challenges in collecting these security-related data. In this paper, we first briefly introduce network security-related data, including its definition and characteristics, and the applications of network data collection. We then provide the requirements and objectives for security-related data collection and present a taxonomy of data collection technologies. Moreover, we review existing collection nodes, collection tools, and collection mechanisms in terms of network data collection and analyze them based on the proposed requirements and objectives toward high quality security-related data collection. Finally, we discuss open research issues and conclude with suggestions for future research directions.INDEX TERMS Network security, security-related data, data collection technologies, large-scale heterogeneous networks.
Application-layer tunnels are often used to construct covert channels in order to transmit secret data, which is often applied to raise network threats in recent years. Detection of application-layer tunnels can assist identifying a variety of network threats, thus has high research significance. In this paper, we explore application-layer tunnel detection and propose a generic detection method by applying both rules and machine learning. Our detection method mainly consists of two parts: rule-based domain name filtering for Domain Generation Algorithm (DGA) based on a trigram model and a machine learning model based on our proposed generic feature extraction framework for tunnel detection. The rule-based DGA domain name filtering can eliminate some obvious tunnels in order to reduce the amount of data processed by machine learning-based detection, thereby, the detection efficiency can be improved. The generic feature extraction framework comprehensively integrates previous research results by combining multiple detection methods, supporting multiple layers and performing multiple feature extraction. We take the three most common application-layer tunnels, i.e., DNS tunnel, HTTP tunnel and HTTPS tunnel as examples to analyze and test our detection method. The experimental results show that the proposed method is generic and efficient, compared with other existing approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.