2017
DOI: 10.1137/140998949
|View full text |Cite
|
Sign up to set email alerts
|

Time-Optimal Top-$k$ Document Retrieval

Abstract: Let D be a collection of D documents, which are strings over an alphabet of size σ, of total length n. We describe a data structure that uses linear space and and reports k most relevant documents that contain a query pattern P , which is a string of length p packed in p/ log σ n words, in time O(p/ log σ n + k). This is optimal in the RAM model in the general case where log D = Θ(log n), and involves a novel RAM-optimal suffix tree search. Our construction supports an ample set of important relevance measures… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 21 publications
(26 citation statements)
references
References 72 publications
(101 reference statements)
0
26
0
Order By: Relevance
“…By using perfect hashing to store the first characters of the edge labels descending from each node of v, we reach the locus in optimal time O(m) and the space is still O(n). If P comes packed using w/ log σ symbols per computer word, we can descend in time O( m log(σ)/w ) [91], which is optimal in the packed model. In the suffix array, all the suffixes starting with P form a range SA[sp..ep], which can be binary searched in time O(m log n), or O(m + log n) with additional structures [81].…”
Section: Suffix Trees and Arraysmentioning
confidence: 99%
See 1 more Smart Citation
“…By using perfect hashing to store the first characters of the edge labels descending from each node of v, we reach the locus in optimal time O(m) and the space is still O(n). If P comes packed using w/ log σ symbols per computer word, we can descend in time O( m log(σ)/w ) [91], which is optimal in the packed model. In the suffix array, all the suffixes starting with P form a range SA[sp..ep], which can be binary searched in time O(m log n), or O(m + log n) with additional structures [81].…”
Section: Suffix Trees and Arraysmentioning
confidence: 99%
“…We then replace our trie by a more sophisticated structure, which is described by Navarro and Nekrich [91,Sec. 2], built on the O(rs) distinct strings of length s. Let d = w/ log σ .…”
Section: Ram-optimal Counting and Locatingmentioning
confidence: 99%
“…In the RAM model with word size Θ(log n), and if the consecutive symbols of P come packed into |P |/ log σ n words, the optimal time is instead O(|P |/ log σ n). This optimal time was recently reached by Navarro and Nekrich [31] (note that their time is not optimal if w = ω(log n)), with a simple application of weak-prefix search, already hinted in the original article [2]. However, even the randomized construction time of the weak-prefix search structure is O(n log n), for any constant > 0.…”
Section: Compact Uncompressedmentioning
confidence: 97%
“…Compared with previous work, other indexes may be faster at counting, but either they are not built in linear deterministic time [5,19,31] or they are not compressed [31,7]. Our index outperforms all the previous compressed [13,1,6], as well as some uncompressed [15], indexes that can be built deterministically.…”
Section: Introductionmentioning
confidence: 93%
See 1 more Smart Citation