2005
DOI: 10.1145/1061318.1061325
|View full text |Cite
|
Sign up to set email alerts
|

What's hot and what's not: tracking most frequent items dynamically

Abstract: Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the "hot items" in the relation: those that appear many times (most frequently, or more than some threshold). For example, end-biased histograms keep the hot items as part of the histogram and are used in selectivity estimation. Hot items are used as simple outliers in data mining, and in anomaly detection in networking applications.We present a new algorithm for dynamically determining t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
360
0
2

Year Published

2005
2005
2017
2017

Publication Types

Select...
5
3
2

Relationship

2
8

Authors

Journals

citations
Cited by 391 publications
(362 citation statements)
references
References 31 publications
0
360
0
2
Order By: Relevance
“…load shedding, prioritization [30]), on shared processing (e.g. on-the-fly aggregation [21]), or specialized algorithms and data structures [11]. Our approach to streaming is about generalizing incremental processing to (nonwindowed) SQL semantics (including nested subqueries and aggregates).…”
Section: Update Processing Mechanismsmentioning
confidence: 99%
“…load shedding, prioritization [30]), on shared processing (e.g. on-the-fly aggregation [21]), or specialized algorithms and data structures [11]. Our approach to streaming is about generalizing incremental processing to (nonwindowed) SQL semantics (including nested subqueries and aggregates).…”
Section: Update Processing Mechanismsmentioning
confidence: 99%
“…The online update of such structures in a dynamic scenario is also a required property. Sampling [39], hot lists [11,30], wavelets [9,18,24], sketches [10] and histograms [21][22][23] are examples of synopses methods to obtain fast and approximated answers.…”
Section: Data Summarizationmentioning
confidence: 99%
“…Mainly, they (and the techniques) are used as subroutine in other problems in the Turnstile data stream model. F 2 and L 2 are used to measure deviations in anomaly detection [154] or interpreted as self-join sizes [13]; with variants of L 1 sketches we can dynamically track most frequent items [55], quantiles [107], wavelets and histograms [104], etc. in the Turnstile model; using L p sketches for p → 0, we can estimate the number of distinct elements at any time in the Turnstile model [47].…”
Section: Norms Estimationmentioning
confidence: 99%