2014
DOI: 10.1016/j.tcs.2014.05.005
|View full text |Cite
|
Sign up to set email alerts
|

New space/time tradeoffs for top- k document retrieval on sequences

Abstract: We address the problem of indexing a collection D = {T 1 , T 2 , ...T D } of D string documents of total length n, so that we can efficiently answer top-k queries: retrieve k documents most relevant to a pattern P of length p given at query time.There exist linear-space data structures, that is, using O(n) words, that answer such queries in optimal O(p + k) time for an ample set of notions of relevance. However, using linear space is not sufficiently good for large text collections. In this paper we explore ho… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 17 publications
(11 citation statements)
references
References 34 publications
0
9
0
Order By: Relevance
“…Another trend adds to the space the so-called document array [51], which uses n lg D + o(n log D) bits and enables faster solutions. Currently the fastest one achieves time O(p + k log * k) [56]. This is very close to optimal, but not yet our O(p/ log σ n + k) time.…”
Section: Discussionmentioning
confidence: 72%
See 2 more Smart Citations
“…Another trend adds to the space the so-called document array [51], which uses n lg D + o(n log D) bits and enables faster solutions. Currently the fastest one achieves time O(p + k log * k) [56]. This is very close to optimal, but not yet our O(p/ log σ n + k) time.…”
Section: Discussionmentioning
confidence: 72%
“…We have shown that our structure can use, instead, O(n(log σ + log D)) bits for the tf measure (and slightly more for others), but the constants are still large. There is a whole trend of reduced-space representations for general document retrieval problems with the tf measure [64,70,41,21,32,38,12,31,68,39,56]. The current situation is as follows [52]: One trend aims at the least space usage.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…A well studied line of work within document indexing is document indexing for top-k queries [12,23,24,25,26,33,34,37,39,42,43]. The goal is to efficiently report the top-k documents of smallest weight, where the weight is a function of the query.…”
Section: Related Workmentioning
confidence: 99%
“…Hon et al achieved O(t s (p)+k t SA log 3+ε n) query time, using O(n/ log ε n) bits. Subsequent work (see [19,25]) improved the initial result up to O(t s (p) + k t SA log 2 k log ε n) [23], and also considered compact indexes, which may use o(n log n) bits on top of the CSA. For example, these achieve O(t s (p) + k t SA log k log ε n) query time using n log σ + o(n) further bits [12], or O(t s (p) + k log * k) query time using n log D + o(n log n) further bits [24].…”
Section: Introductionmentioning
confidence: 99%