Proceedings of the Thirteenth Annual ACM Symposium on Parallel Algorithms and Architectures 2001
DOI: 10.1145/378580.378687
|View full text |Cite
|
Sign up to set email alerts
|

Estimating simple functions on the union of data streams

Abstract: Massive data sets often arise as physically distributed, parallel data streams. We present algorithms for estimating simple functions on the union of such data streams, while using only logarithmic space per stream. Each processor observes only its own stream, and communicates with the other processors only after observing its entire stream. This models the set-up in current network monitoring products.Our algorithms employ a novel coordinated sampling technique to extract a sample of the union; this sample ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
177
0

Year Published

2002
2002
2013
2013

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 169 publications
(179 citation statements)
references
References 28 publications
0
177
0
Order By: Relevance
“…Alon, Matias and Szegedy [1] gave a constant factor approximation in small space. Gibbons and Tirthapura [10] showed a (1± ) factor approximation spaceÕ( 1 2 ); subsequent work has improved the (hidden) logarithmic factors [2].…”
Section: Reviewmentioning
confidence: 99%
“…Alon, Matias and Szegedy [1] gave a constant factor approximation in small space. Gibbons and Tirthapura [10] showed a (1± ) factor approximation spaceÕ( 1 2 ); subsequent work has improved the (hidden) logarithmic factors [2].…”
Section: Reviewmentioning
confidence: 99%
“…However, it is more restricted in that it requires that a data item can never again be retrieved in main memory after its first pass (if it is a one-pass algorithm). A distributed stream model is also proposed in [53] which combines features of both streaming models and communication complexity models.…”
Section: The Data Stream Computation Modelmentioning
confidence: 99%
“…There has been a lot of work in computing over data streams for purposes such as set resemblance, data mining, creating histograms, and so on [11], [17], [29]. Particularly relevant is some recent work [23], [25] which studies the problem of finding the size of the union of two streams. Here, the streams define multisets of elements, and it is the size of the union of the supporting sets that is of interest.…”
Section: Work On Data Streams and Sketchesmentioning
confidence: 99%