General rightsThis document is made available in accordance with publisher policies. Please cite only the published version using the reference above. AbstractWe revisit the complexity of one of the most basic problems in pattern matching. In the k-mismatch problem we must compute the Hamming distance between a pattern of length m and every m-length substring of a text of length n, as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern and text, we simply output "No". We study this problem in both the standard offline setting and also as a streaming problem. In the streaming k-mismatch problem the text arrives one symbol at a time and we must give an output before processing any future symbols. Our main results are as follows:• Our first result is a deterministic O(nk 2 log k/m + n polylog m) time offline algorithm for k-mismatch on a text of length n. This is a factor of k improvement over the fastest previous result of this form from SODA 2000 [9, 10].• We then give a randomised and online algorithm which runs in the same time complexity but requires only O(k 2 polylog m) space in total.• Next we give a randomised (1 + )-approximation algorithm for the streaming k-mismatch problem which uses O(k 2 polylog m/ 2 ) space and runs in O(polylog m/ 2 ) worst-case time per arriving symbol.• Finally we combine our new results to derive a randomised O(k 2 polylog m) space algorithm for the streaming k-mismatch problem which runs in O( √ k log k + polylog m) worst-case time per arriving symbol. This improves the best previous space complexity for streaming k-mismatch from FOCS 2009 [26] by a factor of k. We also improve the time complexity of this previous result by an even greater factor to match the fastest known offline algorithm (up to logarithmic factors).
In this work we present efficient algorithms for constructing sparse suffix trees, sparse suffix arrays and sparse positions heaps for b arbitrary positions of a text T of length n while using only O(b) words of space during the construction.Attempts at breaking the naive bound of Ω(nb) time for constructing sparse suffix trees in O(b) space can be traced back to the origins of string indexing in 1968. First results were only obtained in 1996, but only for the case where the b suffixes were evenly spaced in T . In this paper there is no constraint on the locations of the suffixes.Our main contribution is to show that the sparse suffix tree (and array) can be constructed in O(n log 2 b) time. To achieve this we develop a technique, that allows to efficiently answer b longest common prefix queries on suffixes of T , using only O(b) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Our first solution is Monte-Carlo and outputs the correct tree with high probability. We then give a Las-Vegas algorithm which also uses O(b) space and runs in the same time bounds with high probability when b = O( √ n).Furthermore, additional tradeoffs between the space usage and the construction time for the Monte-Carlo algorithm are given. Finally, we show that at the expense of slower pattern queries, it is possible to construct sparse position heaps in O(n + b log b) time and O(b) space.
We revisit the longest common extension (LCE) problem, that is, preprocess a string T into a compact data structure that supports fast LCE queries. An LCE query takes a pair (i, j) of indices in T and returns the length of the longest common prefix of the suffixes of T starting at positions i and j. We study the time-space trade-offs for the problem, that is, the space used for the data structure vs. the worst-case time for answering an LCE query. Let n be the length of T . Given a parameter τ , 1 ≤ τ ≤ n, we show how to achieve either O(n/ √ τ ) space and O(τ ) query time, or O(n/τ ) space and O(τ log(|LCE(i, j)|/τ )) query time, where |LCE(i, j)| denotes the length of the LCE returned by the query. These bounds provide the first smooth trade-offs for the LCE problem and almost match the previously known bounds at the extremes when τ = 1 or τ = n. We apply the result to obtain improved bounds for several applications where the LCE problem is the computational bottleneck, including approximate string matching and computing palindromes. We also present an efficient technique to reduce LCE queries on two strings to one string. Finally, we give a lower bound on the time-space product for LCE data structures in the non-uniform cell probe model showing that our second trade-off is nearly optimal.
We study the complexity of the popular one player combinatorial game known as Flood-It. In this game the player is given an n×n board of tiles where each tile is allocated one of c colours. The goal is to make the colours of all tiles equal via the shortest possible sequence of flooding operations. In the standard version, a flooding operation consists of the player choosing a colour k, which then changes the colour of all the tiles in the monochromatic region connected to the top left tile to k. After this operation has been performed, neighbouring regions which are already of the chosen colour k will then also become connected, thereby extending the monochromatic region of the board. We show that finding the minimum number of flooding operations is NP-hard for c 3 and that this even holds when the player can perform flooding operations from any position on the board. However, we show that this 'free' variant is in P for c = 2. We also prove that for an unbounded number of colours, Flood-It remains NP-hard for boards of height at least 3, but is in P for boards of height 2. Next we show how a (c − 1) approximation and a randomised 2c/3 approximation algorithm can be derived, and that no polynomial time constant factor, independent of c, approximation algorithm exists unless P=NP. We then investigate how many moves are required for the 'most demanding' n×n boards (those requiring the most moves) and show that the number grows as fast as Θ( √ c n). Finally, we consider boards where the colours of the tiles are chosen at random and show that for c 2, the number of moves required to flood the whole board is Ω(n) with high probability.
Abstract. We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the stream. We present a randomised algorithm which takes O(log log(k + m)) time per arriving character and uses O(k log m) words of space, where k is the number of strings in the dictionary and m is the length of the longest string in the dictionary.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.