Space Efficient Streaming Algorithms for the Maximum Error Histogram

Buragohain, Chiranjeeb; Shrivastava, Nisheeth; Suri, Subhash

doi:10.1109/icde.2007.368961

Cited by 42 publications

(56 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The algorithm they propose is based on using a fixed-length sliding window of data points. In [4], Buragohain et al also address the histogram construction problem. However they represent each bucket by a line segment rather than a single value.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Online piece-wise linear approximation of numerical streams with precision guarantees

Elmeleegy

Elmagarmid

Cecchet³

et al. 2009

Proc. VLDB Endow.

View full text Add to dashboard Cite

Continuous "always-on" monitoring is beneficial for a number of applications, but potentially imposes a high load in terms of communication, storage and power consumption when a large number of variables need to be monitored. We introduce two new filtering techniques, swing filters and slide filters, that represent within a prescribed precision a time-varying numerical signal by a piecewise linear function, consisting of connected line segments for swing filters and (mostly) disconnected line segments for slide filters. We demonstrate the effectiveness of swing and slide filters in terms of their compression power by applying them to a reallife data set plus a variety of synthetic data sets. For nearly all combinations of signal behavior and precision requirements, the proposed techniques outperform the earlier approaches for online filtering in terms of data reduction. The slide filter, in particular, consistently dominates all other filters, with up to twofold improvement over the best of the previous techniques.

show abstract

Section: Related Workmentioning

confidence: 99%

“…We refer to the endpoints of the line segments as the recordings. If g (k-1) and g k are disconnected, k∈ [2,K] (3,5) = (0,0,5,0), and (9,9,9,9)-V 4 (3,5) = (9,9,4,9). Figure 1 shows a sample signal and a possible piece-wise linear approximation illustrating most of the notations described above.…”

Section: Problem Statement and Notationsmentioning

confidence: 99%

Online piece-wise linear approximation of numerical streams with precision guarantees

Elmeleegy

Elmagarmid

Cecchet³

et al. 2009

Proc. VLDB Endow.

View full text Add to dashboard Cite

show abstract

“…Thresholded approximation has been used in the context of histograms before, in the context of "dual" problems where the summary size is minimized to achieve a predetermined error [4,20,11]. Concretely, recall that the maximum error histogram construction problem is: given a set of numbers X = x 1 , x 2 , .…”

Section: The Setup: Requirementsmentioning

confidence: 99%

“…We use three main ideas: (i) we use the notion of a "thresholded approximation" where the goal is to minimize the error assuming we know the optimum error 1 , (ii) we run multiple copies (but controlled in number) of the algorithm corresponding to different estimates of the final error and, (iii) we use a "streamstrapping" procedure to use partially completed summarization for a certain estimate to create summarization for a different estimate of error. The first two ideas have been explicitly used in the context of summarization before, see [4,9,10,20,11] among many others. We are unaware of the use of the third idea in any previous work and we believe that this notion will be useful in a variety of different problems.…”

Section: Introductionmentioning

confidence: 99%

“…were provided one at a time in increasing order of i and the algorithms are restricted to use sublinear space. Since then a large number of algorithms have been proposed, for many different measures and in particular the maximum error, many of which extend to streaming algorithms [16,4,20]. However for every algorithm proposed till date, for any error measure, the space bound depends either on log n, log E * or log M .…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Tight results for clustering and summarizing data streams

Guha

2009

Proceedings of the 12th International Conference on Database Theory

View full text Add to dashboard Cite

In this paper we investigate algorithms and lower bounds for summarization problems over a single pass data stream. In particular we focus on histogram construction and K-center clustering. We provide a simple framework that improves upon all previous algorithms on these problems in either the space bound, the approximation factor or the running time. The framework uses a notion of ``streamstrapping'' where summaries created for the initial prefixes of the data are used to develop better approximation algorithms. We also prove the first non-trivial lower bounds for these problems. We show that the stricter requirement that if an algorithm accurately approximates the error of every bucket or every cluster produced by it, then these upper bounds are almost the best possible. This property of accurate estimation is true of all known upper bounds on these problems. Keywords data streams, clusteringThis conference paper is available at ScholarlyCommons: http://repository.upenn.edu/cis_papers/394Tight results for clustering and summarizing data streams Sudipto Guha * AbstractIn this paper we investigate algorithms and lower bounds for summarization problems over a single pass data stream. In particular we focus on histogram construction and K-center clustering. We provide a simple framework that improves upon all previous algorithms on these problems in either the space bound, the approximation factor or the running time. The framework uses a notion of "streamstrapping" where summaries created for the initial prefixes of the data are used to develop better approximation algorithms. We also prove the first non-trivial lower bounds for these problems. We show that the stricter requirement that if an algorithm accurately approximates the error of every bucket or every cluster produced by it, then these upper bounds are almost the best possible. This property of accurate estimation is true of all known upper bounds on these problems.

show abstract

SIRCS: Slope-intercept-residual Compression by Correlation Sequencing for Multi-stream High Variation Data

Hua

Wang

et al. 2019

Database Systems for Advanced Applications

View full text Add to dashboard Cite

Space Efficient Streaming Algorithms for the Maximum Error Histogram

Cited by 42 publications

References 15 publications

Online piece-wise linear approximation of numerical streams with precision guarantees

Online piece-wise linear approximation of numerical streams with precision guarantees

Tight results for clustering and summarizing data streams

SIRCS: Slope-intercept-residual Compression by Correlation Sequencing for Multi-stream High Variation Data

Contact Info

Product

Resources

About