Abstract:Mobile application (simply ''app'') identification at a per-flow granularity is vital for traffic engineering, network management, and security practices. However, uncertainty is caused by a growing fraction of encrypted traffic such as Hypertext Transfer Protocol Secure. To address this challenge, we have carefully analyzed mobile app traffic (mainly including Domain Name System, Hypertext Transfer Protocol, and encrypted traffic such as Secure Sockets Layer and Transport Layer Security) and observed that (1)… Show more
“…This function has the ability of image compression representation and can guide the Markov-GAN to generate new Markov images with family homology and texture similarity, so as to improve the game level between the discriminator and generator. The coding length function is a data compression representation method proposed in combination with rate distortion theory, as shown in formula (6):…”
Section: Coding Length Loss Functionmentioning
confidence: 99%
“…In the field of mobile applications, malicious APPs generally use encrypted traffic (such as HTTPS) to transmit network data to avoid detection. More than 30% of SSL‐based attacks deceived trusted cloud providers such as Dropbox, Google, Microsoft and Amazon to distribute malware through encrypted channels, which has become more and more complex in avoiding detection [6]. Therefore, how to effectively identify malicious traffic has become an important challenge to network security.…”
The rapidly growing encrypted traffic hides a large number of malicious behaviours. The difficulty of collecting and labelling encrypted traffic makes the class distribution of dataset seriously imbalanced, which leads to the poor generalisation ability of the classification model. To solve this problem, a new representation learning method in encrypted traffic and its diversity enhancement model are proposed, which uses the diversity of images to represent the diversity of traffic samples. First, the encrypted traffic is transformed into Markov images. Then, a diversity maximisation Markov‐GAN based on the Simpson index is designed to generate new Markov images. Finally, the balanced Markov image set is sent to the CNN for classification. Experimental results show that the proposed method can predict the whole dataset space with only a few original samples. And the classification accuracies under different imbalance degrees are significantly improved, all of which are over 90%. The enhanced Markov image set can effectively alleviate performance generalisation deviation caused by different network depths. Even an ordinary CNN has almost the same classification effect as VGG13 and VGG16. Compared with other data enhancement methods, the Markov‐GAN only needs to balance the transform domain dataset, which is lightweight, easy to train and has stronger amplification ability.
“…This function has the ability of image compression representation and can guide the Markov-GAN to generate new Markov images with family homology and texture similarity, so as to improve the game level between the discriminator and generator. The coding length function is a data compression representation method proposed in combination with rate distortion theory, as shown in formula (6):…”
Section: Coding Length Loss Functionmentioning
confidence: 99%
“…In the field of mobile applications, malicious APPs generally use encrypted traffic (such as HTTPS) to transmit network data to avoid detection. More than 30% of SSL‐based attacks deceived trusted cloud providers such as Dropbox, Google, Microsoft and Amazon to distribute malware through encrypted channels, which has become more and more complex in avoiding detection [6]. Therefore, how to effectively identify malicious traffic has become an important challenge to network security.…”
The rapidly growing encrypted traffic hides a large number of malicious behaviours. The difficulty of collecting and labelling encrypted traffic makes the class distribution of dataset seriously imbalanced, which leads to the poor generalisation ability of the classification model. To solve this problem, a new representation learning method in encrypted traffic and its diversity enhancement model are proposed, which uses the diversity of images to represent the diversity of traffic samples. First, the encrypted traffic is transformed into Markov images. Then, a diversity maximisation Markov‐GAN based on the Simpson index is designed to generate new Markov images. Finally, the balanced Markov image set is sent to the CNN for classification. Experimental results show that the proposed method can predict the whole dataset space with only a few original samples. And the classification accuracies under different imbalance degrees are significantly improved, all of which are over 90%. The enhanced Markov image set can effectively alleviate performance generalisation deviation caused by different network depths. Even an ordinary CNN has almost the same classification effect as VGG13 and VGG16. Compared with other data enhancement methods, the Markov‐GAN only needs to balance the transform domain dataset, which is lightweight, easy to train and has stronger amplification ability.
“…Chen et al [37] introduced the imbalanced data gravitation-based classification algorithm for the classification of imbalanced data of malicious apps. He et al [14,38] proposed the identification of encrypted apps' flows via traffic correlation and the detection of repackaged Android apps via comparison of the network behaviors of similar apps. In these methods, the detection is conducted mostly in the router or at the network monitoring node; hence, the performances of mobile devices are not affected.…”
Section: Off-device Detectionmentioning
confidence: 99%
“…In this study, traffic labeling is conducted to identify the corresponding app for each network flow, which is known as the app identification problem [53]. Extensive works have been conducted on the identification of apps from mobile network traffic [38,54,55]. However, the achieved identification accuracies are all lower than 100%.…”
Malware has become a significant problem on the Android platform. To defend against Android malware, researchers have proposed several on-device detection methods. Typically, these on-device detection methods are composed of two steps: (i) extracting the apps’ behavior features from the mobile devices and (ii) sending the extracted features to remote servers (such as a cloud platform) for analysis. By monitoring the behaviors of the apps that are running on mobile devices, available methods can detect suspicious applications (simply, apps) accurately. However, mobile devices are typically resource limited. The feature extraction and massive data transmission might consume substantial power and CPU resources; thus, the performance of mobile devices will be degraded. To address this issue, we propose a novel method for detecting Android malware by clustering apps’ traffic at the edge computing nodes. First, a new integrated architecture of the cloud, edge, and mobile devices for Android malware detection is presented. Then, for repackaged Android malware, the network traffic content and statistics are extracted at the edge as detection features. Finally, in the cloud, similarities between apps are calculated, and the similarity values are automatically clustered to separate the original apps and the malware. The experimental results demonstrate that the proposed method can detect repackaged Android malware with high precision and with a minimal impact on the performance of mobile devices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.