Shortest Unique Substring Query Revisited

İleri, Atalay Mert; Külekçi, M. Oğuzhan; Xu, Bojian

doi:10.1007/978-3-319-07566-2_18

Cited by 21 publications

(28 citation statements)

References 5 publications

(11 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to the related work discussed in Section I, there were recently a sequence of work on finding shortest unique substrings (SUS) [15], [16], [17], [18], [1], of which Hu et al [1] studied the generalized version of SUS finding: Given a string position interval [x..y], 1 ≤ x ≤ y ≤ n, find SUS y x , the shortest unique substring that covers the string position interval [x..y], or the fact that such SUS y x does not exist. To the best of our knowledge, no efficient reduction from LR finding to SUS finding is known as of now.…”

Section: Prior Work and Our Contributionmentioning

confidence: 99%

“…Given the lcp and rank arrays of the string S, we can compute its useful LLRs in O(n) time and space. Proof: By Lemma 2, we know if LLR i−1 exists, the right boundary of LLR i is on or after the right boundary of LLR i−1 , for any i ≥ 2, so we can construct the array of useful LLRs in one pass as follows: we calculate each LLR i using Lemma 1, , 8), (7,13), (10,14), (11,17)}, where each useful LLR is a (start, end) tuple, representing the start and ending position of the LLR. By viewing the start and end positions as the x and y coordinates, all the useful LLRs of the example string can be visualized as the dark dots in the figure. (B) Queries for LR 12 11 , LR 14 11 , LR 12 6 and LR 5 5 are visualized by the red, blue, green, and black polylines, numbered A -D , respectively.…”

Section: A Geometric Perspective Of the Useful Llrs And The Lr Qumentioning

confidence: 99%

“…Search A is for LR 12 11 . That is to find all heaviest dots in S 11,12 , which include dot (7,13) and dot (11,17). Suppose the 2d DMQ launched by search A returns dot (7,13), which has a weight of 7 and is one choice for LR 12 11 .…”

Section: A Find All Choices Of Any Lrmentioning

confidence: 99%

See 2 more Smart Citations

On stabbing queries for generalized longest repeat

2015

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Self Cite

View full text Add to dashboard Cite

A longest repeat query on a string, motivated by its applications in many subfields including computational biology, asks for the longest repetitive substring(s) covering a particular string position (point query). In this paper, we extend the longest repeat query from point query to interval query, allowing the search for longest repeat(s) covering any position interval, and thus significantly improve the usability of the solution. Our method for interval query takes a different approach using the insight from a recent work on shortest unique substrings [1], as the prior work's approach for point query becomes infeasible in the setting of interval query. Using the critical insight from [1], we propose an indexing structure, which can be constructed in the optimal O(n) time and space for a string of size n, such that any future interval query can be answered in O(1) time. Further, our solution can find all longest repeats covering any given interval using optimal O(occ) time, where occ is the number of longest repeats covering that given interval, whereas the prior O(n)-time and space work can find only one candidate for each point query. Experiments with real-world biological data show that our proposal is competitive with prior works, both time and space wise, while providing with the new functionality of interval queries as opposed to point queries provided by prior works.

show abstract

Section: Prior Work and Our Contributionmentioning

confidence: 99%

Section: A Geometric Perspective Of the Useful Llrs And The Lr Qumentioning

confidence: 99%

See 1 more Smart Citation

On stabbing queries for generalized longest repeat

2015

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Self Cite

View full text Add to dashboard Cite

show abstract

“…It thus remains to give a structure for finding Candidate 4. [3,6], [6,7], [7,10] We want to store I in a data structure to answer such queries efficiently.…”

Section: A 4-candidate Lemmamentioning

confidence: 99%

“…In their initial study [9], Pei et al showed how to construct in O(n 2 ) time an index of O(n) size that answers a query in O(1) time. Soon after that, Ileri et al [6] and Tsuruta et al [10] independently improved the construction time to O(n). It is worth mentioning that O(n) size is considered optimal in the sense that D itself requires Ω(n) words to store when the alphabet is large.…”

Section: Introductionmentioning

confidence: 99%

Shortest Unique Queries on Strings

Pei

Tao

2014

String Processing and Information Retrieval

View full text Add to dashboard Cite

Abstract. Let D be a long input string of n characters (from an alphabet of size up to 2 w , where w is the number of bits in a machine word). Given a substring q of D, a shortest unique query returns a shortest unique substring of D that contains q. We present an optimal structure that consumes O(n) space, can be built in O(n) time, and answers a query in O(1) time. We also extend our techniques to solve several variants of the problem optimally.

show abstract

Shortest Unique Palindromic Substring Queries on Run-Length Encoded Strings

Watanabe

Nakashima

Inenaga

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

For a string S, a palindromic substring S[i..j] is said to be a shortest unique palindromic substring (SUPS ) for an interval [s, t] t], and every palindromic substring containing [s, t] which is shorter than S[i..j] occurs at least twice in S. In this paper, we study the problem of answering SUPS queries on run-length encoded strings. We show how to preprocess a given run-length encoded string RLE S of size m in O(m) space and O(m log σ RLE S + m log m/ log log m) time so that all SUPSs for any subsequent query interval can be answered in O( log m/ log log m + α) time, where α is the number of outputs, and σ RLE S is the number of distinct runs of RLE S .

show abstract

Shortest Unique Substring Query Revisited

Cited by 21 publications

References 5 publications

On stabbing queries for generalized longest repeat

On stabbing queries for generalized longest repeat

Shortest Unique Queries on Strings

Shortest Unique Palindromic Substring Queries on Run-Length Encoded Strings

Contact Info

Product

Resources

About