2011
DOI: 10.1016/j.ins.2011.02.002
|View full text |Cite
|
Sign up to set email alerts
|

Reordering columns for smaller indexes

Abstract: a b s t r a c tColumn-oriented indexes-such as projection or bitmap indexes-are compressed by runlength encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right order before sorting can reduce the number of runs by a factor of two or more. Unfortunately, determining the best column order is NP-hard. For many cases, we prove that the number of runs in table columns is minimized if we sort columns by increasing cardinality.… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
24
0

Year Published

2011
2011
2022
2022

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 34 publications
(24 citation statements)
references
References 57 publications
0
24
0
Order By: Relevance
“…In relational database systems, column values are transformed into integer values by dictionary coding [14,15,16,17,18]. To improve compressibility, we may map the most frequent values to the smallest integers [19]. In text retrieval systems, word occurrences are commonly represented array→ differential coding (e.g., δ i = x i − x i−1 ) → compression (e.g., SIMD-BP128) → compressed (a) encoding compressed → decompression (e.g., SIMD-BP128) → differential decoding (e.g., x i = δ i + x i−1 ) → array (b) decoding Figure 1.…”
Section: Introductionmentioning
confidence: 99%
“…In relational database systems, column values are transformed into integer values by dictionary coding [14,15,16,17,18]. To improve compressibility, we may map the most frequent values to the smallest integers [19]. In text retrieval systems, word occurrences are commonly represented array→ differential coding (e.g., δ i = x i − x i−1 ) → compression (e.g., SIMD-BP128) → compressed (a) encoding compressed → decompression (e.g., SIMD-BP128) → differential decoding (e.g., x i = δ i + x i−1 ) → array (b) decoding Figure 1.…”
Section: Introductionmentioning
confidence: 99%
“…Employing a Hilbert curve to construct linear index can realize more efficient multi-conditional single point queries and range queries. It also has good load balancing characteristic which can help avoid the issue of a hot query point [25]. So here we try to use the Hilbert curve to partition a conditional space and implement the structure of a multi-conditional Hilbert value index [33].…”
Section: Multi-conditional Query Methods Based On a Hilbert Space-fmentioning
confidence: 99%
“…In order to support a multi-conditional query in HBase, we propose a new model to generate a RowKey based on a Hilbert space-filling curve [25]. The model employs the spatial continuity and clustering feature of a Hilbert curve to construct a linearized index for realizing single point query and range search [26,27], which has good load balancing characteristic and is capable of avoiding query hotspots.…”
Section: Introductionmentioning
confidence: 99%
“…Compressed bitmaps In a bitmap, there are runs of consecutive 0's and runs of consecutive 1's. The number of such runs is called the RUNCOUNT of a bitmap, or of a collection of bitmaps . For example, in the bitmap index illustrated by Figure , there are 2 + 4+3 = 9 runs.…”
Section: Background and Related Workmentioning
confidence: 99%