The development of the Internet has led to the complexity of network encrypted traffic. Identifying the specific classes of network encryption traffic is an important part of maintaining information security. The traditional traffic classification based on machine learning largely requires expert experience. As an end-to-end model, deep neural networks can minimize human intervention. This paper proposes the CLD-Net model, which can effectively distinguish network encrypted traffic. By segmenting and recombining the packet payload of the raw flow, it can automatically extract the features related to the packet payload, and by changing the expression of the packet interval, it integrates the packet interval information into the model. We use the ability of Convolutional Neural Network (CNN) to distinguish image classes, learn and classify the grayscale images that the raw flow has been preprocessed into, and then use the effectiveness of Long Short-Term Memory (LSTM) network on time series data to further enhance the model’s ability to classify. Finally, through feature reduction, the high-dimensional features learned by the neural network are converted into 8 dimensions to distinguish 8 different classes of network encrypted traffic. In order to verify the effectiveness of the CLD-Net model, we use the ISCX public dataset to conduct experiments. The results show that our proposed model can distinguish whether the unknown network traffic uses Virtual Private Network (VPN) with an accuracy of 98% and can accurately identify the specific traffic (chats, audio, or file) of Facebook and Skype applications with an accuracy of 92.89%.
With the rapid increase in encrypted traffic in the network environment and the increasing proportion of encrypted traffic, the study of encrypted traffic classification has become increasingly important as a part of traffic analysis. At present, in a closed environment, the classification of encrypted traffic has been fully studied, but these classification models are often only for labeled data and difficult to apply in real environments. To solve these problems, we propose a transferable model called CBD with generalization abilities for encrypted traffic classification in real environments. The overall structure of CBD can be generally described as a of one-dimension CNN and the encoder of Transformer. The model can be pre-trained with unlabeled data to understand the basic characteristics of encrypted traffic data, and be transferred to other datasets to complete the classification of encrypted traffic from the packet level and the flow level. The performance of the proposed model was evaluated on a public dataset. The results showed that the performance of the CBD model was better than the baseline methods, and the pre-training method can improve the classification ability of the model.
With the increase in the proportion of encrypted network traffic, encrypted traffic identification (ETI) is becoming a critical research topic for network management and security. At present, ETI under closed world assumption has been adequately studied. However, when the models are applied to the realistic environment, they will face unknown traffic identification challenges and model efficiency requirements. Considering these problems, in this paper, we propose a lightweight unknown traffic discovery model LightSEEN for open-world traffic classification and model update under practical conditions. The overall structure of LightSEEN is based on the Siamese network, which takes three simplified packet feature vectors as input on one side, uses the multihead attention mechanism to parallelly capture the interactions among packets, and adopts techniques including 1D-CNN and ResNet to promote the extraction of deep-level flow features and the convergence speed of the network. The effectiveness and efficiency of the proposed model are evaluated on two public data sets. The results show that the effectiveness of LightSEEN is overall at the same level as the state-of-the-art method and LightSEEN has even better true detection rate, but the parameter used in LightSEEN is
0.51
%
of the baseline and its average training time is
37.9
%
of the baseline.
Since its inception, Bitcoin has been subject to numerous thefts due to its enormous economic value. Hackers steal Bitcoin wallet keys to transfer Bitcoin from compromised users, causing huge economic losses to victims. To address the security threat of Bitcoin theft, supervised learning methods were used in this study to detect and provide warnings about Bitcoin theft events. To overcome the shortcomings of the existing work, more comprehensive features of Bitcoin transaction data were extracted, the unbalanced dataset was equalized, and five supervised methods—the k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), adaptive boosting (AdaBoost), and multi-layer perceptron (MLP) techniques—as well as three unsupervised methods—the local outlier factor (LOF), one-class support vector machine (OCSVM), and Mahalanobis distance-based approach (MDB)—were used for detection. The best performer among these algorithms was the RF algorithm, which achieved recall, precision, and F1 values of 95.9%. The experimental results showed that the designed features are more effective than the currently used ones. The results of the supervised methods were significantly better than those of the unsupervised methods, and the results of the supervised methods could be further improved after equalizing the training set.
The encryption of network traffic promotes the development of encrypted traffic classification and identification research. However, many existing studies are only effective for closed-set experimental data, that is to say, only for traffic of known classes, while there are often lots of unknown classes traffic in the real environment of open sets, and many studies have difficulty identifying the traffic of unknown classes and can only misclassify them as known classes. How to identify unknown traffic and classify known traffic in an open-collection environment is one of the focuses of traffic analysis research. Considering these problems, this paper proposes a novel solution, which applies the open-set recognition method to the unknown traffic identification, and constructs a model based on deep learning and ensemble learning. The method constructs a model based on a convolutional neural network and a transformer encoder and then uses a three-stage training and testing process, combined with a novel loss function, to generalize to the open space to form OpenCBD. Experiments on public datasets show that the proposed method is significantly better than other open-set identification methods. It can not only distinguish known traffic from unknown traffic but also identify specific classes of known traffic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.