Estimating the cardinality of data streams over a sliding window is an important problem in many applications, such as network traffic monitoring, web access log analysis and database. The problem becomes more difficult in largescale data streams when time and space complexity is taken into account. In this paper, we present a novel randomized data structure to address the problem. The significant contributions are as follows. (1) A space-efficient counter vector sketch (CVS) are proposed, which extends the well-known bitmap sketch to sliding window settings. (2) Based on the CVS, a random update mechanism is introduced, whereby a small fixed number of entries are randomly chosen from CVS in a step and then updated. This means that the update procedure just costs constant time. (3) Furthermore, estimating cardinality by CVS just needs one-pass scan of the data. (4) Finally, a theoretical analysis is given to show the accuracy of CVS-based estimators. Our comprehensive experiments confirm that the CVS-based schema attains high accuracy, and that its time efficiency in comparison with the timestemp vector (TSV) and the auxiliary indexing method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.