Reordering columns for smaller indexes

Lemire, Daniel; Kaser, Owen

doi:10.1016/j.ins.2011.02.002

Cited by 34 publications

(24 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In relational database systems, column values are transformed into integer values by dictionary coding [14,15,16,17,18]. To improve compressibility, we may map the most frequent values to the smallest integers [19]. In text retrieval systems, word occurrences are commonly represented array→ differential coding (e.g., δ i = x i − x i−1 ) → compression (e.g., SIMD-BP128) → compressed (a) encoding compressed → decompression (e.g., SIMD-BP128) → differential decoding (e.g., x i = δ i + x i−1 ) → array (b) decoding Figure 1.…”

Section: Introductionmentioning

confidence: 99%

Decoding billions of integers per second through vectorization

Lemire

Boytsov

2013

Softw. Pract. Exper.

216

183

View full text Add to dashboard Cite

SUMMARYIn many important applications-such as search engines and relational database systems-data are stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and single-instruction, multiple-data (SIMD) instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128? that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128? saves up to 2 bits/int. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple8b) while being two times faster during decoding.

show abstract

Section: Introductionmentioning

confidence: 99%

Decoding billions of integers per second through vectorization

Lemire

Boytsov

2013

Softw. Pract. Exper.

216

183

View full text Add to dashboard Cite

show abstract

“…Employing a Hilbert curve to construct linear index can realize more efficient multi-conditional single point queries and range queries. It also has good load balancing characteristic which can help avoid the issue of a hot query point [25]. So here we try to use the Hilbert curve to partition a conditional space and implement the structure of a multi-conditional Hilbert value index [33].…”

Section: Multi-conditional Query Methods Based On a Hilbert Space-fmentioning

confidence: 99%

“…In order to support a multi-conditional query in HBase, we propose a new model to generate a RowKey based on a Hilbert space-filling curve [25]. The model employs the spatial continuity and clustering feature of a Hilbert curve to construct a linearized index for realizing single point query and range search [26,27], which has good load balancing characteristic and is capable of avoiding query hotspots.…”

Section: Introductionmentioning

confidence: 99%

A Method of HBase Multi-Conditional Query for Ubiquitous Sensing Applications

Shen

Liao

Dan

et al. 2018

Sensors

View full text Add to dashboard Cite

Big data gathered from real systems, such as public infrastructure, healthcare, smart homes, industries, and so on, by sensor networks contain enormous value, and need to be mined deeply, which depends on a data storing and retrieving service. HBase is playing an increasingly important part in the big data environment since it provides a flexible pattern for storing extremely large amounts of unstructured data. Despite the fast-speed reading by RowKey, HBase does not natively support multi-conditional query, which is a common demand and operation in relational databases, especially for data analysis of ubiquitous sensing applications. In this paper, we introduce a method to construct a linear index by employing a Hilbert space-filling curve. As a RowKey generating schema, the proposed method maps multiple index-columns into a one-dimensional encoded sequence, and then constructs a new RowKey. We also provide a R-tree-based optimization to reduce the computational cost of encoding query conditions. Without using a secondary index mode, experimental results indicate that the proposed method has better performance in multi-conditional queries.

show abstract

“…Compressed bitmaps In a bitmap, there are runs of consecutive 0's and runs of consecutive 1's. The number of such runs is called the RUNCOUNT of a bitmap, or of a collection of bitmaps . For example, in the bitmap index illustrated by Figure , there are 2 + 4+3 = 9 runs.…”

Section: Background and Related Workmentioning

confidence: 99%

Compressed bitmap indexes: beyond unions and intersections

Kaser

Lemire

2014

Softw Pract Exp

Self Cite

View full text Add to dashboard Cite

SUMMARYCompressed bitmap indexes are used to speed up simple aggregate queries in databases. Indeed, set operations like intersections, unions and complements can be represented as logical operations (AND,OR, NOT) that are ideally suited for bitmaps. However, it is less obvious how to apply bitmaps to more advanced queries. For example, we might seek products in a store that meet some, but maybe not all, criteria. Such threshold queries generalize intersections and unions; they are often used in information-retrieval and datamining applications. We introduce new algorithms that are sometimes three orders of magnitude faster than a naïve approach. Our work shows that bitmap indexes are more broadly applicable than is commonly believed.

show abstract

Reordering columns for smaller indexes

Cited by 34 publications

References 57 publications

Decoding billions of integers per second through vectorization

Decoding billions of integers per second through vectorization

A Method of HBase Multi-Conditional Query for Ubiquitous Sensing Applications

Compressed bitmap indexes: beyond unions and intersections

Contact Info

Product

Resources

About