Improving the Detection of On-Line Vertical Port Scan in IP Traffic

International audienceAbstract:Data streams are large data sets generated continuously and at a fast tempo. Their arrival rate is large compared to the treatment and storage capacities. Thus, these streams cannot be entirely stored. That is why we need to treat them in a single pass, without storing them exhaustively. However, for a particular stream, it is not always possible to predict in advance all of the processing to be performed. It is therefore necessary to save some of this data for future treatments. These stored data then build “summaries”. Several ways exist for the construction of the summary, among them, the sampling algorithms. We propose in this paper an in-depth study of sampling methods used for the construction of data stream summaries. This paper includes two main parts. First, we introduce the basic concepts of data stream: Windowing models over data stream as well as data stream applications. Then we describe the different sampling algorithms used in stream environments. We particularly focus on their advantages and drawbacks. Finally, we compare the performance of the Simple Random Sampling to the chain sampling algorithm and we discuss the relevant research challenges for data stream sampling

show abstract

Sampling algorithms in data stream environments

Sibai

Chabchoub

Demerjian

et al. 2016

2016 International Conference on Digital Economy (ICDEc)

View full text Add to dashboard Cite

show abstract

A performance study of the chain sampling algorithm

Sibai

Chabchoub

Demerjian

et al. 2015

2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS)

View full text Add to dashboard Cite

International audienceOn-line data stream analysis is an important challenge today because of the always-increasing rates of the streams issued from multiple heterogeneous sources, in many application domains. To reduce the amount of the data stream, several sampling methods were designed by the data stream research community. We focus in this paper, on the chain sampling algorithm proposed by Babcock et al. The aim of this algorithm is to select randomly and at any time, a given fixed proportion from the most recent items of the stream contained in the last sliding window. This algorithm is well adapted to the stream context, as only one pass over the data is performed. Moreover it uses a small memory, as it does not store all the items of the current sliding window. We show in this paper that the chain sampling algorithm suffers from some collision or redundancy problems. The collision occurs when the same item is selected as a sample more than once during the execution of the algorithm. We propose two approaches to overcome this weakness and improve the chain sampling algorithm. The first one is called “inverting the selection for a high sampling rate” and the second one is inspired from the “divide to conquer strategy”. Different experimentations are performed to show the efficiency of these two improvements, in particular their impact on the execution time of the algorithm

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Improving the Detection of On-Line Vertical Port Scan in IP Traffic

Cited by 2 publications

References 23 publications

Sampling algorithms in data stream environments

Sampling algorithms in data stream environments

A performance study of the chain sampling algorithm

Contact Info

Product

Resources

About