Recently, a majority of security operations centers (SOCs) have been facing a critical issue of increased adoption of transport layer security (TLS) encryption on the Internet, in network traffic analysis (NTA). To this end, in this survey article, we present existing research on NTA and related areas, primarily focusing on TLS-encrypted traffic to detect and classify malicious traffic with deployment scenarios for SOCs. Security experts in SOCs and researchers in academia can obtain useful information from our survey, as the main focus of our survey is NTA methods applicable to malware detection and family classification. Especially, we have discussed pros and cons of three main deployment models for encrypted NTA: TLS interception, inspection using cryptographic functions, and passive inspection without decryption. In addition, we have discussed the state-of-the-art methods in TLS-encrypted NTA for each component of a machine learning pipeline, typically used in the state-of-the-art methods.
In parallel with the rapid adoption of transport layer security (TLS), malware has utilized the encrypted communication channel provided by TLS to hinder detection from network traffic. To this end, recent research efforts are directed toward malware detection and malware family classification for TLS-encrypted traffic. However, amongst their feature sets, the proposals to utilize the sequential information of each TLS session has not been properly evaluated, especially in the context of malware family classification. In this context, we propose a systematic framework to evaluate the state-of-the-art malware family classification methods for TLS-encrypted traffic in a controlled environment and discuss the advantages and limitations of the methods comprehensively. In particular, our experimental results for the 10 representations and classifier combinations show that the graph-based representation for the sequential information achieves better performance regardless of the evaluated classification algorithms. With our framework and findings, researchers can design better machine learning based classifiers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.