Probabilistic wavelet synopses

Garofalakis, Minos; Gibbons, Phillip B.

doi:10.1145/974750.974753

Cited by 64 publications

(127 citation statements)

References 19 publications

Supporting

Mentioning

124

Contrasting

Unclassified

Order By: Relevance

“…The sanitary bound c is set to the 10-percent value in the data as in [3,4]. We used our implementations of the probabilistic thresholding scheme [3,4] and the deterministic thresholding scheme [17] as representatives of traditional summarization techniques that considers relative errors as objective function.…”

Section: Resultsmentioning

confidence: 99%

“…[4,4] 4 We illustrate the decomposition of [2,4] for a given interval tree in Figure 3. The interval [2,4] is decomposed into [2,2] and [3,4]. …”

Section: Lemma 32 To Compute Min X {Max(f (X) G(x))} Where F (X) Anmentioning

confidence: 99%

“…We experimented with four different permutation techniques that were used in [3,4]: NoPerm, Normal, PipeOrgan and Random. Normal permutes the frequencies to resemble a bellshaped normal distribution, with higher frequencies at the center of the domain.…”

Section: Synthetic Data Setsmentioning

confidence: 99%

See 2 more Smart Citations

REHISTRelative Error Histogram Construction Algorithms

Guha

SHIM

WOO

2004

Proceedings 2004 VLDB Conference

126

View full text Add to dashboard Cite

Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, such as V-Optimal, optimize absolute error measures for which the error in estimating a true value of 10 by 20 has the same effect of estimating a true value of 1000 by 1010. However, several researchers have recently pointed out the drawbacks of such schemes and proposed wavelet based schemes to minimize relative error measures. None of these schemes provide satisfactory guarantees -and we provide evidence that the difficulty may lie in the choice of wavelets as the representation scheme.In this paper, we consider histogram construction for the known relative error measures. We develop optimal as well as fast approximation algorithms. We provide a comprehensive theoretical analysis and demonstrate the effectiveness of these algorithms in providing significantly more accurate answers through synthetic and real life data sets.

show abstract

Section: Resultsmentioning

confidence: 99%

“…[4,4] 4 We illustrate the decomposition of [2,4] for a given interval tree in Figure 3. The interval [2,4] is decomposed into [2,2] and [3,4]. …”

Section: Lemma 32 To Compute Min X {Max(f (X) G(x))} Where F (X) Anmentioning

confidence: 99%

See 1 more Smart Citation

REHISTRelative Error Histogram Construction Algorithms

Guha

SHIM

WOO

2004

Proceedings 2004 VLDB Conference

126

View full text Add to dashboard Cite

show abstract

“…Each data cell in A can be accurately reconstructed by adding up the contributions (with the appropriate signs) of those coefficients whose support regions include the cell. Error-tree structures for d-dimensional Haar coefficients are essentially d-dimensional quadtrees, where each internal node t corresponds to a set of (at most) 2 d − 1 Haar coefficients, and has 2 d children corresponding to the quadrants of the (common) support region of all coefficients in t; furthermore, properties (P1) and (P2) can also be naturally extended to the multi-dimensional case [2,7,8].…”

Section: Averagesmentioning

confidence: 99%

“…The wavelet transform has a long history of successful applications in signal and image processing [11,12]. Several recent studies have also demonstrated the effectiveness of the wavelet transform (and Haar wavelets, in particular) as a tool for approximate query processing over massive relational tables [2,7,8] and continuous data streams [3,9]. Briefly, the idea is to apply wavelet transform to the input relation to obtain a compact data synopsis that comprises a select small collection of wavelet coefficients.…”

mentioning

confidence: 99%

Discrete Wavelet Transform and Wavelet Synopses

SpringerReference

View full text Add to dashboard Cite

SYNONYMS None. DEFINITIONWavelets are a useful mathematical tool for hierarchically decomposing functions in ways that are both efficient and theoretically sound. Broadly speaking, the wavelet transform of a function consists of a coarse overall approximation together with detail coefficients that influence the function at various scales. The wavelet transform has a long history of successful applications in signal and image processing [11,12]. Several recent studies have also demonstrated the effectiveness of the wavelet transform (and Haar wavelets, in particular) as a tool for approximate query processing over massive relational tables [2,7,8] and continuous data streams [3,9]. Briefly, the idea is to apply wavelet transform to the input relation to obtain a compact data synopsis that comprises a select small collection of wavelet coefficients. The excellent energy compaction and de-correlation properties of the wavelet transform allow for concise and effective approximate representations that exploit the structure of the data. Furthermore, wavelet transforms can generally be computed in linear time, thus allowing for very efficient algorithms. HISTORICAL BACKGROUNDA growing number of database applications require on-line, interactive access to very large volumes of data to perform a variety of data-analysis tasks. As an example, large Internet Service Providers (ISPs) typically collect and store terabytes of detailed usage information (NetFlow/SNMP flow statistics, packet-header information, etc.) from the underlying network to satisfy the requirements of various network-management tasks, including billing, fraud/anomaly detection, and strategic planning. This data gives rise to massive, multi-dimensional relational data tables typically stored and queried/analyzed using commercial database engines (such as, Oracle, SQL Server, DB2). To handle the huge data volumes, high query complexities, and interactive response-time requirements characterizing these modern data-analysis applications, the idea of effective, easyto-compute approximate query answers over precomputed, compact data synopses has recently emerged as a viable solution. Due to the exploratory nature of most target applications, there are a number of scenarios in which a (reasonably-accurate) fast approximate answer over a small-footprint summary of the database is actually preferable over an exact answer that takes hours or days to compute. For example, during a "drill-down" query sequence in ad-hoc data mining, initial queries in the sequence frequently have the sole purpose of determining the truly interesting queries and regions of the database. Providing fast approximate answers to these initial queries gives users the ability to focus their explorations quickly and effectively, without consuming inordinate amounts of valuable system resources.The key behind such approximate techniques for dealing with massive data sets lies in the use of appropriate data-reduction techniques for constructing compact synopses that can accurately approximate t...

show abstract