What's hot and what's not: tracking most frequent items dynamically

Cormode, Graham; Muthukrishnan, S.

doi:10.1145/1061318.1061325

Cited by 391 publications

(362 citation statements)

References 31 publications

Supporting

Mentioning

360

Contrasting

Unclassified

Order By: Relevance

“…load shedding, prioritization [30]), on shared processing (e.g. on-the-fly aggregation [21]), or specialized algorithms and data structures [11]. Our approach to streaming is about generalizing incremental processing to (nonwindowed) SQL semantics (including nested subqueries and aggregates).…”

Section: Update Processing Mechanismsmentioning

confidence: 99%

DBToaster: higher-order delta processing for dynamic, frequently fresh views

et al. 2014

View full text Add to dashboard Cite

Applications ranging from algorithmic trading to scientific data analysis require realtime analytics based on views over databases that change at very high rates. Such views have to be kept fresh at low maintenance cost and latencies. At the same time, these views have to support classical SQL, rather than window semantics, to enable applications that combine current with aged or historical data.In this paper, we present viewlet transforms, a recursive finite differencing technique applied to queries. The viewlet transform materializes a query and a set of its higher-order deltas as views. These views support each other's incremental maintenance, leading to a reduced overall view maintenance cost. The viewlet transform of a query admits efficient evaluation, the elimination of certain expensive query operations, and aggressive parallelization. We develop viewlet transforms into a workable query execution technique, present a heuristic and cost-based optimization framework, and report on experiments with a prototype dynamic data management system that combines viewlet transforms with an optimizing compilation technique. The system supports tens of thousands of complete view refreshes a second for a wide range of queries.

show abstract

Section: Update Processing Mechanismsmentioning

confidence: 99%

DBToaster: higher-order delta processing for dynamic, frequently fresh views

et al. 2014

View full text Add to dashboard Cite

show abstract

“…The online update of such structures in a dynamic scenario is also a required property. Sampling [39], hot lists [11,30], wavelets [9,18,24], sketches [10] and histograms [21][22][23] are examples of synopses methods to obtain fast and approximated answers.…”

Section: Data Summarizationmentioning

confidence: 99%

Fading histograms in detecting distribution and concept changes

Sebastião

Gama

Mendonça

2017

Int J Data Sci Anal

View full text Add to dashboard Cite

The remarkable number of real applications under dynamic scenarios is driving a novel ability to generate and gather information. Nowadays, a massive amount of information is generated at a high-speed rate, known as data streams. Moreover, data are collected under evolving environments. Due to memory restrictions, data must be promptly processed and discarded immediately. Therefore, dealing with evolving data streams raises two main questions: (i) how to remember discarded data? and (ii) how to forget outdated data? To maintain an updated representation of the time-evolving data, this paper proposes fading histograms. Regarding the dynamics of nature, changes in data are detected through a windowing scheme that compares data distributions computed by the fading histograms: the adaptive cumulative windows model (ACWM). The online monitoring of the distance between data distributions is evaluated using a dissimilarity measure based on the asymmetry of the Kullback-Leibler divergence. The experimental results support the ability of fading histograms in providing an updated representation of data. Such property works in favor of detecting distribution changes with smaller detection delay time when compared with standard histograms. With respect to the detection of concept changes, the ACWM is compared with 3 known algorithms taken from the literature, using artificial data and using public data sets, presenting better results. Furthermore, we the proposed method was extended for multidimensional and the experiments performed show the ability of the ACWM for detecting distribution changes in these settings.

show abstract

“…Mainly, they (and the techniques) are used as subroutine in other problems in the Turnstile data stream model. F 2 and L 2 are used to measure deviations in anomaly detection [154] or interpreted as self-join sizes [13]; with variants of L 1 sketches we can dynamically track most frequent items [55], quantiles [107], wavelets and histograms [104], etc. in the Turnstile model; using L p sketches for p → 0, we can estimate the number of distinct elements at any time in the Turnstile model [47].…”

Section: Norms Estimationmentioning

confidence: 99%

Data Streams: Algorithms and Applications

Muthukrishnan

2005

FNT in Theoretical Computer Science

Self Cite

759

541

View full text Add to dashboard Cite

In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints on space, time, and number of passes. Some of the methods rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity. The applications for this scenario include IP network traffic analysis, mining text message streams and processing massive data sets in general. Researchers in Theoretical Computer Science, Databases, IP Networking and Computer Systems are working on the data stream challenges. This article is an overview and survey of data stream algorithmics and is an updated version of [175].

show abstract

What's hot and what's not: tracking most frequent items dynamically

Cited by 391 publications

References 31 publications

DBToaster: higher-order delta processing for dynamic, frequently fresh views

DBToaster: higher-order delta processing for dynamic, frequently fresh views

Fading histograms in detecting distribution and concept changes

Data Streams: Algorithms and Applications

Contact Info

Product

Resources

About