The network community always pays its attention to find better methods for traffic classification, which is crucial for Internet Service Providers (ISPs) to provide better QoS for users. Prior works on traffic classification mainly focus their attentions on dividing Internet traffic into different categories based on application layer protocols (such as HTTP, BitTorrent etc.). Making traffic classification from another point of view, we divide Internet traffic into different content types. Our technology is an attempt to solve the classification problem of network traffic, which contains unknown and proprietary protocols (i.e., no publicly available protocol specification).In this paper, we design a classifier which can distinguish Internet traffic into different content types using machine learning techniques. Features of our classifier are entropy of consecutive bytes and frequencies of characters. Our method is capable of classifying real-world traces into different content types (including Text, Picture, Audio, Video, Compressed, Base64-encoded image, Base64-encoded text and Encrypted). The chief features of our classifier are small computing space (about 1K Bytes) and high classification accuracy (about 81%).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.