Summarizing Order Statistics over Data Streams with Duplicates

Zhang, Ying; Lin, Xuemin; Yuan, Ye; Zhou, Xiaofang; Yu, Jeffery Xu

doi:10.1109/icde.2007.369004

Cited by 6 publications

(7 citation statements)

References 23 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table V shows candidate limits of probabilistic k-skybands for typical values of parameters k and window sizes n when σ = 0.001. We can see that the number of candidates is much smaller than window size n, and decreases significantly compared to n for large values of n. For small values of parameter k, the number of candidates is larger in the probabilistic k-skyband than in the k-skyband since the expected size of the latter is k[ln( n k ) − 1] for random-order data streams, as shown in Zhang [2008]. 4.1.2.…”

Section: Theorem 44 Let Q Be a Top-k/w Query Over A Count-based Winmentioning

confidence: 91%

Time- and Space-Efficient Sliding Window Top-k Query Processing

Pripužić

Žarko

Aberer

2015

ACM Trans. Database Syst.

View full text Add to dashboard Cite

A sliding window top-k (top-k/w) query monitors incoming data stream objects within a sliding window of size w to identify the k highest-ranked objects with respect to a given scoring function over time. Processing of such queries is challenging because, even when an object is not a top-k/w object at the time when it enters the processing system, it might become one in the future. Thus a set of potential top-k/w objects has to be stored in memory while its size should be minimized to efficiently cope with high data streaming rates. Existing approaches typically store top-k/w and candidate sliding window objects in a k-skyband over a two-dimensional score-time space. However, due to continuous changes of the k-skyband, its maintenance is quite costly. Probabilistic k-skyband is a novel data structure storing data stream objects from a sliding window with significant probability to become top-k/w objects in future. Continuous probabilistic k-skyband maintenance offers considerably improved runtime performance compared to k-skyband maintenance, especially for large values of k, at the expense of a small and controllable error rate. We propose two possible probabilistic k-skyband usages: (i) When it is used to process all sliding window objects, the resulting top-k/w algorithm is approximate and adequate for processing random-order data streams. (ii) When probabilistic k-skyband is used to process only a subset of most recent sliding window objects, it can improve the runtime performance of continuous k-skyband maintenance, resulting in a novel exact top-k/w algorithm. Our experimental evaluation systematically compares different top-k/w processing algorithms and shows that while competing algorithms offer either time efficiency at the expanse of space efficiency or vice-versa, our algorithms based on the probabilistic k-skyband are both time and space efficient.

show abstract

Section: Theorem 44 Let Q Be a Top-k/w Query Over A Count-based Winmentioning

confidence: 91%

Time- and Space-Efficient Sliding Window Top-k Query Processing

Pripužić

Žarko

Aberer

2015

ACM Trans. Database Syst.

View full text Add to dashboard Cite

show abstract

“…Existing solution on top-k/w processing ( [29,8,9,6,30,10,31,7,20,19,21]) assume centralized processing at a single network node and thus differ significantly from the distributed top-k/w processing approach we present in this paper. They can be classified in two categories: deterministic approaches ( [8,9,6,30,20,19,21]) which produce correct results to defined queries, and probabilistic approaches ( [29,7,19]) which generate errors and thus produce approximate results, but are in general more efficient and require less memory than the deterministic approaches.…”

Section: Data Stream Processing Systemsmentioning

confidence: 95%

“…Furthermore, as a top-k/w query continuously identifies k best-ranked data objects in the query window with respect to an arbitrary scoring function, we can additionally classify existing algorithms according to the type of supported scoring functions. Examples are distance [29,9,6], aggregation [8,30,31] and relevance [20,21] scoring functions.…”

Section: Data Stream Processing Systemsmentioning

confidence: 99%

Top-k/w publish/subscribe: A publish/subscribe model for continuous top-k processing over data streams

Pripuić

arko

Aberer

2014

Information Systems

View full text Add to dashboard Cite

Continuous processing of top-k queries over data streams is a promising technique for alleviating the information overload problem as it distinguishes relevant from irrelevant data stream objects with respect to a given scoring function over time. Thus it enables filtering of irrelevant data objects and delivery of top-k objects relevant to user interests in real-time. We propose a solution for distributed continuous top-k processing based on the publish/subscribe communication paradigm-top-k publish/subscribe over sliding windows (top-k/w publish/subscribe). It identifies k best-ranked objects with respect to a given scoring function over a sliding window of size w, and extends the publish/subscribe communication paradigm by continuous top-k processing algorithms coming from the field of data stream processing.In this paper, we introduce, analyze and evaluate the essential building blocks of distributed top-k/w publish/subscribe systems: First, we present a formal top-k/w publish/subscribe model and compare it to the prevailing Boolean publish/subscribe model. Next, we outline the top-k/w processing tasks performed by publish/subscribe nodes and investigate the properties of supported scoring functions. Furthermore, we explore potential routing strategies for distributed top-k/w publish/subscribe systems. Finally, we experimentally evaluate model properties and provide a comparative study investigating traffic requirements of potential routing strategies.

show abstract

“…These works can be classified in two categories: deterministic approaches [4,9,11,[18][19][20]26] which produce correct results to defined queries, and probabilistic approaches [13,14,26] which generate errors and thus produce approximate results, but are in general more efficient and require less memory than the deterministic approaches. Furthermore, as a top-k/w query continuously identifies k best-ranked data objects in the query window with respect to an arbitrary scoring function, we can additionally classify these works according to whether distance [4,14,20], aggregation [9,18,33] or relevance [11,19] scoring function is assumed. Following this categorization, k-NN/w queries are topk/w queries with distance scoring functions.…”

Section: Related Workmentioning

confidence: 99%

Distributed processing of continuous sliding-window k-NN queries for data stream filtering

2011

View full text Add to dashboard Cite

A sliding-window k-NN query (k-NN/w query) continuously monitors incoming data stream objects within a sliding window to identify k closest objects to a query. It enables effective filtering of data objects streaming in at high rates from potentially distributed sources, and offers means to control the rate of object insertions into result streams. Therefore k-NN/w processing systems may be regarded as one of the prospective solutions for the information overload problem in applications that require processing of structured data in real-time, such as the Sensor Web. Existing k-NN/w processing systems are mainly centralized and cannot cope with multiple data streams, where data sources are scattered over the Internet. In this paper, we propose a solution for distributed continuous k-NN/w processing of structured data from distributed streams. We define a k-NN/w processing model for such setting, and design a distributed k-NN/w processing system on top of the Content-Addressable Network (CAN) overlay. An extensive evaluation using both real and synthetic data sets demonstrates the feasibility of the proposed solution because it balances the load among the peers, while the messaging overhead within the P2P network remains reasonable. Moreover, our results clearly show the solution is scalable for an increasing number of queries and peers.

show abstract

Summarizing Order Statistics over Data Streams with Duplicates

Cited by 6 publications

References 23 publications

Time- and Space-Efficient Sliding Window Top-k Query Processing

Time- and Space-Efficient Sliding Window Top-k Query Processing

Top-k/w publish/subscribe: A publish/subscribe model for continuous top-k processing over data streams

Distributed processing of continuous sliding-window k-NN queries for data stream filtering

Contact Info

Product

Resources

About