The rapidly growing encrypted traffic hides a large number of malicious behaviours. The difficulty of collecting and labelling encrypted traffic makes the class distribution of dataset seriously imbalanced, which leads to the poor generalisation ability of the classification model. To solve this problem, a new representation learning method in encrypted traffic and its diversity enhancement model are proposed, which uses the diversity of images to represent the diversity of traffic samples. First, the encrypted traffic is transformed into Markov images. Then, a diversity maximisation Markov‐GAN based on the Simpson index is designed to generate new Markov images. Finally, the balanced Markov image set is sent to the CNN for classification. Experimental results show that the proposed method can predict the whole dataset space with only a few original samples. And the classification accuracies under different imbalance degrees are significantly improved, all of which are over 90%. The enhanced Markov image set can effectively alleviate performance generalisation deviation caused by different network depths. Even an ordinary CNN has almost the same classification effect as VGG13 and VGG16. Compared with other data enhancement methods, the Markov‐GAN only needs to balance the transform domain dataset, which is lightweight, easy to train and has stronger amplification ability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.