2015
DOI: 10.1007/978-3-319-19929-0_18
|View full text |Cite
|
Sign up to set email alerts
|

Compact Indexes for Flexible Top-$$k$$ Retrieval

Abstract: We engineer a self-index based retrieval system capable of rank-safe evaluation of top-k queries. The framework generalizes the GREEDY approach of Culpepper et al. (ESA 2010) to handle multiterm queries, including over phrases. We propose two techniques which significantly reduce the ranking time for a wide range of popular Information Retrieval (IR) relevance measures, such as TF×IDF and BM25. First, we reorder elements in the document array according to document weight. Second, we introduce the repetition a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2015
2015
2017
2017

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 25 publications
0
5
0
Order By: Relevance
“…All programs were compiled with optimizations using g++ version 5.2.0 We are using test collections from the natural language domain, as character and as word sequence: two Wikipedia dumps of different size, a subset of publicly available Reddit comments 4 and all revisions of 100 Finnish Wikipedia articles (each revision is a single document). Additionally we use a word parsing of the TREC gov2 collection [7]. Table 1 tion and benchmarks are publicly available 5 and contain all parameters left out here due to space constrains.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…All programs were compiled with optimizations using g++ version 5.2.0 We are using test collections from the natural language domain, as character and as word sequence: two Wikipedia dumps of different size, a subset of publicly available Reddit comments 4 and all revisions of 100 Finnish Wikipedia articles (each revision is a single document). Additionally we use a word parsing of the TREC gov2 collection [7]. Table 1 tion and benchmarks are publicly available 5 and contain all parameters left out here due to space constrains.…”
Section: Methodsmentioning
confidence: 99%
“…E.g. v 13 is mapped to [7,11]. The repetitions of interest are restricted to (−∞, depth(v P ) − 1] in the y range; so (−∞, 1] in our example.…”
Section: The Basic Framework and Data Structuresmentioning
confidence: 99%
“…All programs were compiled with optimizations using g++ version 4.9.0. We are using test collections from the natural language domain: two Wikipedia dumps of different size, parsed as character and as word sequences [9], and a word parsing of the TREC GOV2 collection [10]. Table 1 summarizes their properties.…”
Section: Methodsmentioning
confidence: 99%
“…Table 1 summarizes their properties. Our implementation and benchmarks are publicly available 3 as part of the SUccinct Retrieval Framework (SURF), which was introduced in [10]. The experiments can be easily reproduced by running the provided scripts.…”
Section: Methodsmentioning
confidence: 99%
“…We also demonstrate the efficacy of our implementation for handling strings on large alphabets (with many millions of distinct symbols), which is important, e.g., for applications in natural language processing [21] and information retrieval [10]. Efficiency for large alphabets has been to date unaddressed by previous studies on EM suffix sorting [6,7,4,12,18,13], in all of which a byte alphabet is assumed.…”
Section: Introductionmentioning
confidence: 97%