2013
DOI: 10.1002/spe.2203
|View full text |Cite
|
Sign up to set email alerts
|

Decoding billions of integers per second through vectorization

Abstract: SUMMARYIn many important applications-such as search engines and relational database systems-data are stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and single-instruction, multiple-data (SIMD) instructions. Nevertheless, we introduce a n… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
183
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 214 publications
(184 citation statements)
references
References 51 publications
(141 reference statements)
1
183
0
Order By: Relevance
“…Yan et al [9] analysed the compression ratio and decompression speed of many schemes while dealing with the compression of doc ids and term frequencies extracted from the GOV2 corpus using the AOL query log dataset. A similar work has been done by Lemire et al [10], studying the codecs behaviour for the ClueWeb09 corpus, but focusing just on document ids. While dealing with smaller datasets, Delbru et al [11] provided a comparison of codecs integrating these into an actual search engine with positional information.…”
Section: Introductionmentioning
confidence: 90%
See 4 more Smart Citations
“…Yan et al [9] analysed the compression ratio and decompression speed of many schemes while dealing with the compression of doc ids and term frequencies extracted from the GOV2 corpus using the AOL query log dataset. A similar work has been done by Lemire et al [10], studying the codecs behaviour for the ClueWeb09 corpus, but focusing just on document ids. While dealing with smaller datasets, Delbru et al [11] provided a comparison of codecs integrating these into an actual search engine with positional information.…”
Section: Introductionmentioning
confidence: 90%
“…In the Rice codec [22], b is a power of two, which means that bitwise operators can be exploited, permitting more efficient implementations at the cost of a small increase in the size of the compressed data. Nevertheless, Golomb and Rice coding are well-known for their decompression inefficiency [7,10,23], and hence, we omit experiments using these codecs from our work. Simple family: This family of codecs, firstly described in [23], stores as many integers as possible in a single word.…”
Section: List-adaptive Codecsmentioning
confidence: 99%
See 3 more Smart Citations