Block-oriented compression techniques for large statistical databases

Ng, Wee-Keong; Ravishankar, Chinya V.

doi:10.1109/69.591455

Cited by 48 publications

(4 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Binary packing is closely related to Frame-Of-Reference (FOR) from Goldstein et al [39] and tuple differential coding from Ng and Ravishankar [40]. In such techniques, arrays of values are partitioned into blocks (e.g., of 128 integers).…”

Section: Binary Packingmentioning

confidence: 99%

Decoding billions of integers per second through vectorization

Lemire

Boytsov

2013

Softw. Pract. Exper.

216

183

View full text Add to dashboard Cite

SUMMARYIn many important applications-such as search engines and relational database systems-data are stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and single-instruction, multiple-data (SIMD) instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128? that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128? saves up to 2 bits/int. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple8b) while being two times faster during decoding.

show abstract

Section: Binary Packingmentioning

confidence: 99%

Decoding billions of integers per second through vectorization

Lemire

Boytsov

2013

Softw. Pract. Exper.

216

183

View full text Add to dashboard Cite

show abstract

“…Other approaches combined the physical design of a database and compression. Ng and Ravishankar proposed to compress data within large statistical databases by storing the difference between tuples rather than storing all the tuples (Ng & Ravishankar, 1997). The authors in (Kimura, Narasayya, & Syamala, 2011) proposed to use compression in order to guide the physical design of a database and to choose the right auxiliary database structures (indices, materialized view, etc.)…”

Section: Related Workmentioning

confidence: 99%

Efficient Compression and Storage of XML OLAP Cubes

Boukraâ

Bouchoukh

Boussaïd

2015

International Journal of Data Warehousing and Mining

View full text Add to dashboard Cite

In this paper, the authors present an approach to efficiently compress XML OLAP cubes. They propose a multidimensional snowflake schema of the cube as the basic physical configuration. The cube is then composed of one XML fact document and as many XML documents as the dimension hierarchy members. The basic configuration is reorganized into two ways by adding data redundancy on purpose in order to achieve a better compression ratio on the one hand and to improve query response time on the other hand. In the second configuration, all the documents of the cube are merged into one single XML document. In the third configuration, each reference between the fact and the dimensions or between the members of a dimension hierarchy is replaced by the whole XML referenced fragments. To the three physical configurations of the cube, the authors apply a new compression technique named XCC. They demonstrate the efficiency of the third configuration before and after compression and they also show the efficiency of their compression technique when applied to XML OLAP cubes.

show abstract

“…The synthetic data sets were modeled on the criteria used in (Ng & Ravishankar, 1997). Two parameters, degree of skew and degree of variation were used in their generation.…”

Section: Comparison Of the Prime Factor And Wavelet Schemesmentioning

confidence: 99%

“…This effectively ruled out standard compression techniques such as Huffman coding (Cormack, 1985), LZW and its variants (Lempel & Ziv, 1977;Hunt 1998) Arithmetic Coding (Langdon, 1984). These schemes enable decoding to the original data with 100% accuracy, but suffer from modest compression ratios (Ng & Ravishankar, 1997). On the other hand, the trends analysis nature of decision making means that query results do not need to reflect 100% accuracy.…”

Section: Introductionmentioning

confidence: 99%

Optimization of Multidimensional Aggregates in Data Warehouses

Pears

Houliston

2007

Journal of Database Management

View full text Add to dashboard Cite

The computation of multidimensional aggregates is a common operation in OLAP applications. The major bottleneck is the large volume of data that needs to be processed which leads to prohibitively expensive query execution times. On the other hand, data analysts are primarily concerned with discerning trends in the data and thus a system that provides approximate answers in a timely fashion would suit their requirements better. In this article we present the prime factor scheme, a novel method for compressing data in a warehouse. Our data compression method is based on aggregating data on each dimension of the data warehouse. We used both real world and synthetic data to compare our scheme against the Haar wavelet and our experiments on range-sum queries show that it outperforms the latter scheme with respect to both decoding time and error rate, while maintaining comparable compression ratios. One encouraging feature is the stability of the error rate when compared to the Haar wavelet. Although wavelets have been shown to be effective at compressing data, the approximate answers they provide varies widely, even for identical types of queries on nearly identical values in distinct parts of the data. This problem has been attributed to the thresholding technique used to reduce the size of the encoded data and is an integral part of the wavelet compression scheme. In contrast the prime factor scheme does not rely on thresholding but keeps a smaller version of every data element from the original data and is thus able to achieve a much higher degree of error stability which is important from a Data Analysts point of view.

show abstract

Block-oriented compression techniques for large statistical databases

Cited by 48 publications

References 19 publications

Decoding billions of integers per second through vectorization

Decoding billions of integers per second through vectorization

Efficient Compression and Storage of XML OLAP Cubes

Optimization of Multidimensional Aggregates in Data Warehouses

Contact Info

Product

Resources

About