2014
DOI: 10.1007/978-3-319-06028-6_30
|View full text |Cite
|
Sign up to set email alerts
|

On Inverted Index Compression for Search Engine Efficiency

Abstract: Abstract. Efficient access to the inverted index data structure is a key aspect for a search engine to achieve fast response times to users' queries. While the performance of an information retrieval (IR) system can be enhanced through the compression of its posting lists, there is little recent work in the literature that thoroughly compares and analyses the performance of modern integer compression schemes across different types of posting information (document ids, frequencies, positions). In this paper, we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0
1

Year Published

2014
2014
2019
2019

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(17 citation statements)
references
References 24 publications
0
15
0
1
Order By: Relevance
“…A format inspired by FASTPFOR (where bits from exceptions are bit packed) is a part of the search engine Apache Lucene as of version 4.5. In contrast, Catena et al found that FASTPFOR provided no response time benefit compared with NewPFD and OptPFD when compressing document identifiers. Possibly, the discrepancy can be explained by the fact that they divided their arrays into small chunks of 1024 integers prior to compression.…”
Section: Integer Compressionmentioning
confidence: 93%
“…A format inspired by FASTPFOR (where bits from exceptions are bit packed) is a part of the search engine Apache Lucene as of version 4.5. In contrast, Catena et al found that FASTPFOR provided no response time benefit compared with NewPFD and OptPFD when compressing document identifiers. Possibly, the discrepancy can be explained by the fact that they divided their arrays into small chunks of 1024 integers prior to compression.…”
Section: Integer Compressionmentioning
confidence: 93%
“…The experiments of Catena et al [3] 6 , discussed in Section 1, use the Hadoop implementation, which can also store negative numbers. Positive integers less than 128 are stored in one byte with a leading (high bit) 0.…”
Section: Variable Byte Encoding: Algorithmsmentioning
confidence: 99%
“…Recently Catena et al [3] re-examined the result of Scholer et al in light of hardware, operating system, compiler, language, and search engine algorithmic improvements over the last 10 years. They show that, of the schemes they tested, the PForDelta family of schemes resulted in the highest throughput.…”
Section: Introductionmentioning
confidence: 96%
“…Struktur data pengindeksan information retrieval yang terkenal adalah inverted index. Untuk setiap kata yang muncul di koleksi dokumen, inverted index akan berisi posting list atau daftar dokumen yang mempunyai kata tersebut yang berisi informasi kemunculan kata tersebut pada dokumen (dapat berupa frekuensi kata atau posisi) [7].…”
Section: A Inverted Indexunclassified