Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems 2010
DOI: 10.1145/1807085.1807094
|View full text |Cite
|
Sign up to set email alerts
|

An optimal algorithm for the distinct elements problem

Abstract: We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet and Martin in their seminal paper in FOCS 1983. This problem has applications to query optimization, Internet routing, network topology, and data mining. For a stream of indices in {1, . . . , n}, our algorithm computes a (1 ± ε)-approximation using an optimal O(ε −2 +log(n)) bits of space with 2/3 success probability, where 0 < ε < 1… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
282
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 269 publications
(283 citation statements)
references
References 41 publications
1
282
0
Order By: Relevance
“…Then γ 1 is the number of distinct elements in the sequence W (I). The results of Kane, Nelson and Woodruff [12] imply that using O(ε −2 + log n) space we can compute a valueγ 1 such that…”
Section: Theorem 12mentioning
confidence: 99%
“…Then γ 1 is the number of distinct elements in the sequence W (I). The results of Kane, Nelson and Woodruff [12] imply that using O(ε −2 + log n) space we can compute a valueγ 1 such that…”
Section: Theorem 12mentioning
confidence: 99%
“…We choose data structures that satisfy our constraints (see \S 2.4), yet note that there are further candidates. For example, there are extensions available for the HyperLogLog algorithm that we use [10]: Kane et al [14] propose an algorithm with an even lower memory overhead; it however remains complex and seems impractical to implement [13]. Heule et al likewise propose a series of improvements to HyperLogLog [13].…”
Section: Communication Overheadmentioning
confidence: 99%
“…A traditional streaming aggregation task, which we henceforth call "whole stream aggregation" asks for aggregates such as a frequency moments, quantiles, or heavy hitters over the entire stream, and a number of works have dealt with estimating these using limited memory (see, e.g., [1,18,21,24,23]). The major difference between whole stream aggregation and correlated aggregation is that in case of correlated aggregation, the scope of aggregation is restricted to only those items which satisfy the selection predicate.…”
Section: Introductionmentioning
confidence: 99%