Rethinking SIMD Vectorization for In-Memory Databases

Polychroniou, Orestis; Raghavan, Arun; Ross, Kenneth A.

doi:10.1145/2723372.2747645

Cited by 135 publications

(101 citation statements)

References 37 publications

Supporting

Mentioning

100

Contrasting

Order By: Relevance

“…As opposed to multi-threading, which enables thread-level parallelism, vectorized instructions enable data-level parallelism, where the degree of parallelism depends on the width of the specialized registers 8 . When working on a data type for which k values fit into // add data object to result set these registers, SIMD offers a theoretical speed-up of k; however, this value is rarely achieved in practice as multiple other factors, such as memory bandwidth and the concrete instruction to perform, play an important role [32]. For instance, AVX instructions, which work on 256-bit SIMD registers, can process eight 32-bit floating-point values in parallel with one instruction and offer a theoretical speed-up of a factor of 8.…”

Section: Vectorized Instructionsmentioning

confidence: 99%

Multidimensional range queries on modern hardware

Sprenger

Schäfer

Leser

2018

Proceedings of the 30th International Conference on Scientific and Statistical Database Management

View full text Add to dashboard Cite

Range queries over multidimensional data are an important part of database workloads in many applications. Their execution may be accelerated by using multidimensional index structures (MDIS), such as kd-trees or R-trees. As for most index structures, the usefulness of this approach depends on the selectivity of the queries, and common wisdom told that a simple scan beats MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom is largely based on evaluations that are almost two decades old, performed on data being held on disks, applying IO-optimized data structures, and using single-core systems. The question is whether this rule of thumb still holds when multidimensional range queries (MDRQ) are performed on modern architectures with large main memories holding all data, multi-core CPUs and data-parallel instruction sets.In this paper, we study the question whether and how much modern hardware influences the performance ratio between index structures and scans for MDRQ. To this end, we conservatively adapted three popular MDIS, namely the R * -tree, the kd-tree, and the VA-file, to exploit features of modern servers and compared their performance to different flavors of parallel scans using multiple (synthetic and real-world) analytical workloads over multiple (synthetic and real-world) datasets of varying size, dimensionality, and skew. We find that all approaches benefit considerably from using main memory and parallelization, yet to varying degrees. Our evaluation indicates that, on current machines, scanning should be favored over parallel versions of classical MDIS even for very selective queries. KEYWORDSMultidimensional Index Structures, Modern Hardware 1 Queries over high-dimensional datasets or using similarity predicates are out of scope of this work; for supporting such use cases, we refer the reader to excellent surveys, like [6]. Hard Disk Drive MDIS Multi-Core CPU Main Memory MDIS -one thread -scalar instructions -many threads -scalar/SIMD instructions Figure 1: Classical disk-based set-up for MDIS (left) versus an adaptation to modern hardware (right).contrast, the classical MDIS were designed for row-wise data layouts. Thus, it is time to re-evaluate the performance of MDIS for MDRQ to see if the traditional rule of thumb still holds. Clearly, such a re-evaluation requires an adaptation of the original index structures to the features of modern hardware (see Figure 1) and should be carried out using analytical workloads.In this experimental analysis, we study the question whether and how much the changes in hardware and workloads influence the performance of MDIS compared to sequential scans. To this end, we adapted three popular MDIS to be executed in a parallel and in-memory setting, namely (1) the R * -tree [2], an optimized variant of the R-tree [15], (2) the kd-tree [3], an index structure already originally designed for in-memory computations, and (3) the VAfile [41], which can be considered as a mixture between a MDIS and a sequential scan. Our adaptation is conser...

show abstract

Section: Vectorized Instructionsmentioning

confidence: 99%

Multidimensional range queries on modern hardware

Sprenger

Schäfer

Leser

2018

Proceedings of the 30th International Conference on Scientific and Statistical Database Management

View full text Add to dashboard Cite

show abstract

“…A more thorough operator redesign was shown to be required to to fully take advantage of vectorized instructions [6]. e authors used selective load and store and sca er/gather operations available in modern SIMD instruction sets as building blocks for new scan and join operators.…”

Section: Related Workmentioning

confidence: 99%

Scaling column imprints using advanced vectorization

Sidirourgos

Mühleisen

2017

Proceedings of the 13th International Workshop on Data Management on New Hardware

View full text Add to dashboard Cite

Column Imprints is a pre-ltering secondary index for answering range queries. e main feature of imprints is that they are lightweight and are based on compressed bit-vectors, one per cacheline, that quickly determine if the values in that cacheline satisfy the predicates of a query. e main overhead of the imprints implementation is the many sequential value comparisons against the boundaries of a virtual equi-height histogram. Similarly, during query scans, many sequential value comparisons are performed to identify false positives. In this paper, we speed-up the process of imprints creation and querying by using advanced vectorization techniques. We also experimentally explore the bene ts of stretching imprints to larger bit-vector sizes and blocks of data, using 256-bit SIMD registers. Our ndings are very promising for both imprints and for future index design research that would employ advanced vectorization techniques and larger (up to 512-bit) and more (from 16 now to 32) SIMD registers.

show abstract

“…Zhou [16], range indexes [17], Bloom filters [18], hash tables and partitioning used in radixsort and hash joins [19].…”

Section: Related Workmentioning

confidence: 99%

Efficient Lightweight Compression Alongside Fast Scans

Polychroniou

Ross

2015

Proceedings of the 11th International Workshop on Data Management on New Hardware

Self Cite

View full text Add to dashboard Cite

The increasing main-memory capacity has allowed query execution to occur primarily in main memory. Database systems employ compression, not only to fit the data in main memory, but also to address the memory bandwidth bottleneck. Lightweight compression schemes focus on efficiency over compression rate and allow query operators to process the data in compressed form. For instance, dictionary compression keeps the distinct column values in a sorted dictionary and stores the values as index codes with the minimum number of bits. Packing the bits of each code contiguously, namely horizontal bit packing, has been optimized by using SIMD instructions for unpacking and by evaluating predicates in parallel per processor word for selection scans. Interleaving the bits of codes, namely vertical bit packing, provides faster scans, but incurs prohibitive costs for packing and unpacking. Here, we improve packing and unpacking for vertical bit packing using SIMD instructions, achieving more than an order of magnitude speedup. Also, we optimize horizontal bit packing on the latest CPUs and compare all approaches. While no single variant is better in all cases, vertical bit packing offers a good trade-off by combining the fastest scans with comparably fast packing and unpacking.

show abstract

Rethinking SIMD Vectorization for In-Memory Databases

Cited by 135 publications

References 37 publications

Multidimensional range queries on modern hardware

Multidimensional range queries on modern hardware

Scaling column imprints using advanced vectorization

Efficient Lightweight Compression Alongside Fast Scans

Contact Info

Product

Resources

About