Evangelia Sitaridi scite author profile

Mueller

Kaldewey

et al. 2016

Today's exponentially increasing data volumes and the high cost of storage make compression essential for the Big Data industry. Although research has concentrated on efficient compression, fast decompression is critical for analytics queries that repeatedly read compressed data. While decompression can be parallelized somewhat by assigning each data block to a different process, break-through speed-ups require exploiting the massive parallelism of modern multi-core processors and GPUs for data decompression within a block. We propose two new techniques to increase the degree of parallelism during decompression. The first technique exploits the massive parallelism of GPU and SIMD architectures. The second sacrifices some compression efficiency to eliminate data dependencies that limit parallelism during decompression. We evaluate these techniques on the decompressor of the DEFLATE scheme, called Inflate, which is based on LZ77 compression and Huffman encoding. We achieve a 2× speed-up in a head-to-head comparison with several multi-core CPU-based libraries, while achieving a 17 % energy saving with comparable compression ratios.

show abstract

Ameliorating memory contention of OLAP operators on GPU processors

2012

Optimizing select conditions on GPUs

2013

Implementations of data processing operators on GPU processors have achieved significant performance improvements over their multicore CPU counterparts. To achieve maximum performance, database operator implementations must take into consideration special features of GPU architectures. A crucial di↵erence is that the unit of execution is a group ("warp") of threads, 32 threads in our target architecture, as opposed to a single thread for CPUs. In the presence of branches, threads in a warp have to follow the same execution path; if some threads diverge then di↵erent paths are serialized. Additionally, similarly to CPUs, branches degrade the e ciency of instruction scheduling. Here, we study conjunctive selection queries where branching hurts performance. We compute the optimal execution plan for a conjunctive query, taking branch penalties into account and consider both single-kernel and multi-kernel plans. Our evaluation suggests that divergence a↵ects performance significantly and that our techniques reduce resource underutilization and improve operator performance.

show abstract

GPU-accelerated string matching for database applications

2015

The VLDB Journal

SIMD-accelerated regular expression matching

Polychroniou

2016