21st International Conference on Data Engineering (ICDE'05)
DOI: 10.1109/icde.2005.68
|View full text |Cite
|
Sign up to set email alerts
|

Finding (Recently) Frequent Items in Distributed Data Streams

Abstract: We consider the problem of maintaining frequency counts for items occurring frequently in the union of multiple distributed data streams. Naive methods of combining approximate frequency counts from multiple nodes tend to result in excessively large data structures that are costly to transfer among nodes. To minimize communication requirements, the degree of precision maintained by each node while counting item frequencies must be managed carefully. We introduce the concept of a precision gradient for managing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
130
0
1

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 145 publications
(131 citation statements)
references
References 19 publications
(47 reference statements)
0
130
0
1
Order By: Relevance
“…This is studied by Babcock et al [18] in the distributed setting, and extended by Olston et al [19] to support sum and average queries. These approaches aim to keep the local elephants aligned with the global ones and hence face the same issue as the above solution [17]-icebergs that are finely distributed among the local nodes are hard to discover.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This is studied by Babcock et al [18] in the distributed setting, and extended by Olston et al [19] to support sum and average queries. These approaches aim to keep the local elephants aligned with the global ones and hence face the same issue as the above solution [17]-icebergs that are finely distributed among the local nodes are hard to discover.…”
Section: Related Workmentioning
confidence: 99%
“…Our work differs from theirs since we assume fixed measurement periods, which potentially allows us to have more communication-efficient mechanisms. Manjhi et al [17] studied the problem of discovering icebergs in a distributed environment when nodes are arranged in a multi-level communication hierarchy. We study the simpler, practically motivated single-level communication scheme instead.…”
Section: Related Workmentioning
confidence: 99%
“…For example, in the case of network routers, maintaining a random sample from the union of the streams is valuable for network monitoring tasks involving the detection of global properties [4]. Other problems on distributed stream processing, including the estimation of the number of distinct elements [1], [5] and heavy hitters [6], [7], [8], [9], use random sampling as a primitive (we note, though, that better solutions for the heavy hitters problem in terms of the accuracy parameter may be possible [9] than those provided by random sampling). Distributed random sampling is already used in current day "big data" systems such as BlinkDB [10], which use stored random samples to process queries quickly, in exchange for relaxed accuracy guarantees.…”
Section: Introductionmentioning
confidence: 99%
“…A number of heuristic solutions have been proposed recently for set unions such as the game above and other set expressions, quantiles, heavy hitters and sketch-maintenance [163,98,49,48].…”
Section: Distributed Continuous Computationmentioning
confidence: 99%