Position heaps: A simple and dynamic text indexing data structure

Ehrenfeucht, Andrzej; McConnell, Ross M.; Osheim, Nissa; Woo, Sung-Whan

doi:10.1016/j.jda.2010.12.001

Cited by 39 publications

(65 citation statements)

References 9 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other side in LSTs the two pointers (length and position in the text) associated with any arc (where the total number of arcs is bounded by 2n − 2) are replaced with one symbol. Anyhow, the LSTs cannot compete for instance, in terms of space efficiency, with some relatively new data structures, such as compressed suffix trees [24] and position heaps [10].…”

Section: Lemma 1 Let σ Be Any Alphabet and W Be A String Over σ Of Lmentioning

confidence: 99%

“…Patricias [22] posed this issue but required to know the length of the substring to be skipped during searching. Some related questions have been addressed for other variants of suffix trees, such as position heaps [10,23], where the ideas proposed are completely different from those in this paper as the underlying structure is a heap rather than a trie. Also, the trick of encoding the input string as a path and its substrings as subpaths 1 does not answer the question as conceptually the arcs of the suffix tree are still (implicitly) labeled with substrings.…”

Section: Introductionmentioning

confidence: 96%

See 1 more Smart Citation

Linear-size suffix tries

Crochemore

Epifanio

Grossi

et al. 2016

Theoretical Computer Science

View full text Add to dashboard Cite

Please cite this article in press as: M. Crochemore et al., Linear-size suffix tries, Theoret. Comput. Sci. (2016), http://dx.doi.org/10.1016/j.tcs. 2016.04.002 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Linear-Size Suffix TriesSuffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n = |w|, a suffix tree for w takes O(n) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be Θ(n 2 ) nodes and links for suffix tries in the worst case because of their unary nodes. It is an interesting question if the suffix trie can be stored using O(n) nodes. We present the linear-size suffix trie, which guarantees O(n) nodes. We use a new technique for reducing the number of unary nodes to O(n), that stems from some results on antidictionaries. For instance, by using the linear-size suffix trie, we are able to check whether a pattern p of length m = |p| occurs in w in O(m log |Σ|) time and we can find the longest common substring of two strings w 1 and w 2 in O((|w 1 | + |w 2 |) log |Σ|) time for an alphabet Σ.

show abstract

Section: Lemma 1 Let σ Be Any Alphabet and W Be A String Over σ Of Lmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 96%

Linear-size suffix tries

Crochemore

Epifanio

Grossi

et al. 2016

Theoretical Computer Science

View full text Add to dashboard Cite

show abstract

“…Finally, in Section 7 we show that an entirely different data structure, the position heap of Ehrenfeucht et al [8], yields a completely different tradeoff for indexing a sparse set of positions. Position heaps are in a sense "easier" to compute than suffix trees or suffix arrays, since it is not necessary to sort the entire suffixes.…”

Section: Our Resultsmentioning

confidence: 99%

“…The position heap H T over a text T 1,n is a blend of a trie over T 's suffixes and a heap over its indices [8]:…”

Section: Position Heapsmentioning

confidence: 99%

See 1 more Smart Citation

Sparse Text Indexing in Small Space

Bille

Fischer

Gørtz

et al. 2016

ACM Trans. Algorithms

View full text Add to dashboard Cite

In this work we present efficient algorithms for constructing sparse suffix trees, sparse suffix arrays and sparse positions heaps for b arbitrary positions of a text T of length n while using only O(b) words of space during the construction.Attempts at breaking the naive bound of Ω(nb) time for constructing sparse suffix trees in O(b) space can be traced back to the origins of string indexing in 1968. First results were only obtained in 1996, but only for the case where the b suffixes were evenly spaced in T . In this paper there is no constraint on the locations of the suffixes.Our main contribution is to show that the sparse suffix tree (and array) can be constructed in O(n log 2 b) time. To achieve this we develop a technique, that allows to efficiently answer b longest common prefix queries on suffixes of T , using only O(b) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Our first solution is Monte-Carlo and outputs the correct tree with high probability. We then give a Las-Vegas algorithm which also uses O(b) space and runs in the same time bounds with high probability when b = O( √ n).Furthermore, additional tradeoffs between the space usage and the construction time for the Monte-Carlo algorithm are given. Finally, we show that at the expense of slower pattern queries, it is possible to construct sparse position heaps in O(n + b log b) time and O(b) space.

show abstract

Searchable encryption: A survey on privacy‐preserving search schemes on encrypted outsourced data

Handa

Krishna

Aggarwal

2019

Concurrency and Computation

View full text Add to dashboard Cite

Outsourcing confidential data to cloud storage leads to privacy challenges that can be reduced using encryption. However, with encryption in place, the utilization of the data is reduced, which leads to reduced quality of experience of the users. To overcome this, searchable encryption (SE) schemes are utilized, which allow the end users to retrieve the relevant documents from the cloud, for which various researchers have worked utilizing different techniques. Despite the popularity of the searchable encryption schemes, most of the surveys either do not provide or present an incomplete taxonomy of SE schemes. Hence, in this paper, we attempt to present a complete taxonomy/classification of the searchable encryption schemes in terms of the type of search, type of index, results retrieved, implementation type, multiplicity of users, and the technique used. From the literature, it is observed that inner product similarity is widely adopted by researchers to compute the similarity of the query and the document index as it provides both conjunctive and disjunctive searching (ie, have better search capability) but requires high search time (ie, have lower search efficiency). On the other hand, schemes based on binary comparisons exist, which require less search time (ie, have better search efficiency) but support only conjunctive searching (ie, have limited search capability). Thus, a major conclusion drawn from our work is that there is an imbalance between search capability and search efficiency, ie, in the existing schemes, search capability can be improved at the cost of search time only. Therefore, we suggest that one direction where researchers should work on is to provide a balance between search capability and search efficiency.

show abstract

Position heaps: A simple and dynamic text indexing data structure

Cited by 39 publications

References 9 publications

Linear-size suffix tries

Linear-size suffix tries

Sparse Text Indexing in Small Space

Searchable encryption: A survey on privacy‐preserving search schemes on encrypted outsourced data

Contact Info

Product

Resources

About