Proceedings of the 2017 Internet Measurement Conference 2017
DOI: 10.1145/3131365.3131407
|View full text |Cite
|
Sign up to set email alerts
|

A high-performance algorithm for identifying frequent items in data streams

Abstract: Estimating frequencies of items over data streams is a common building block in streaming data measurement and analysis. Misra and Gries introduced their seminal algorithm for the problem in 1982, and the problem has since been revisited many times due its practicality and applicability. We describe a highly optimized version of Misra and Gries' algorithm that is suitable for deployment in industrial settings. Our code is made public via an open source library called DataSketches that is already used by severa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 27 publications
(9 citation statements)
references
References 47 publications
(83 reference statements)
0
8
0
Order By: Relevance
“…The Space Saving algorithm [37] can find weighted heavy hitters in a stream with an update time of 𝑂 (log 𝜖 −1 ) [14]. Recent advancements [6,9,12] reduce this runtime to a constant. Thus, the tail latency problem for weighted heavy hitters may be solved with the same asymptotic complexity as the unweighted versions and with an error of up to 𝑀𝜖.…”
Section: Extensions Of Supporting Tail Latencies For Traffic Volume H...mentioning
confidence: 99%
“…The Space Saving algorithm [37] can find weighted heavy hitters in a stream with an update time of 𝑂 (log 𝜖 −1 ) [14]. Recent advancements [6,9,12] reduce this runtime to a constant. Thus, the tail latency problem for weighted heavy hitters may be solved with the same asymptotic complexity as the unweighted versions and with an error of up to 𝑀𝜖.…”
Section: Extensions Of Supporting Tail Latencies For Traffic Volume H...mentioning
confidence: 99%
“…The scientific work described in 19 papers is categorized as measurements [4,7,8,17,19,[25][26][27][28][29][30][31][32][33][34][35][36][37][38]. Finally, 8 papers are classified as miscellaneous [9,11,14,18,20,22,23,42].…”
Section: Grouping Of Artifactsmentioning
confidence: 99%
“…The Space Saving algorithm [30] can find weighted heavy hitters over a stream with O(log −1 ) update time [12]. Recent breakthroughs [9,7,3] improve this runtime to a constant. Thus, we can solve the interval volume estimation and (weighted) heavy hitters problems with the same asymptotic complexity as the unweighted variants and with an error of at most W M .…”
Section: Supporting Traffic Volume Heavy-hittersmentioning
confidence: 99%