Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems 2016
DOI: 10.1145/2902251.2902284
|View full text |Cite
|
Sign up to set email alerts
|

An Optimal Algorithm for l1-Heavy Hitters in Insertion Streams and Related Problems

Abstract: We give the first optimal bounds for returning the ℓ1-heavy hitters in a data stream of insertions, together with their approximate frequencies, closing a long line of work on this problem. For a stream of m items in {1, 2, . . . , n} and parameters 0 < ε < ϕ 1, let fi denote the frequency of item i, i.e., the number of times item i occurs in the stream. With arbitrarily large constant probability, our algorithm returns all items i for which fi ϕm, returns no items j for which fj (ϕ − ε)m, and returns approxim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 17 publications
(24 citation statements)
references
References 66 publications
0
24
0
Order By: Relevance
“…It is known that there are significant differences between these models. For instance, identifying an index i ∈ [n] for which |x i | > 1 10 n j=1 |x j | can be accomplished with only O(log(n)) bits of space in the insertion-only model [10], but requires Ω(log 2 (n)) bits in the turnstile model [38]. This log(n) gap between the complexity in the two models occurs in many other important streaming problems.…”
Section: Introductionmentioning
confidence: 99%
“…It is known that there are significant differences between these models. For instance, identifying an index i ∈ [n] for which |x i | > 1 10 n j=1 |x j | can be accomplished with only O(log(n)) bits of space in the insertion-only model [10], but requires Ω(log 2 (n)) bits in the turnstile model [38]. This log(n) gap between the complexity in the two models occurs in many other important streaming problems.…”
Section: Introductionmentioning
confidence: 99%
“…All the experimental metrics are averaged over 5 independent runs. Moreover, in all experiments, Lazy SpaceSaving ± and SpaceSaving ± use the same amount of space, while the universe size is 𝑈 = 2 16 , and we set 𝛿 = 𝑈 −1 to align the experiments with the theoretical literature [7,30].…”
Section: Methodsmentioning
confidence: 99%
“…We approximate instantaneous throughput by calculating throughput (using system timestamps) every κ observations. In our evaluation, we fix κ = 2 17 .…”
Section: Methodsmentioning
confidence: 99%