Longest Common Prefixes with k-Errors and Applications

Ayad, Lorraine A. K.; Barton, Carl; Charalampopoulos, Panagiotis; Iliopoulos, Costas S.; Pissis, Solon P.

doi:10.1007/978-3-030-00479-8_3

Cited by 9 publications

(10 citation statements)

References 39 publications

(48 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In [1], the authors introduced an efficient construction of a genome mappability array B k in which B k [μ] is the smallest length m such that at least μ of the length-m substrings of T do not occur elsewhere in T with at most k mismatches. This construction was further improved in [6].…”

Section: Solution Time Complexitymentioning

confidence: 99%

Efficient Computation of Sequence Mappability

et al. 2022

Self Cite

View full text Add to dashboard Cite

Sequence mappability is an important task in genome resequencing. In the (k, m)-mappability problem, for a given sequence T of length n, the goal is to compute a table whose ith entry is the number of indices $$j \ne i$$ j ≠ i such that the length-m substrings of T starting at positions i and j have at most k mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of $$k=1$$ k = 1 . We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for $$k=O(1)$$ k = O ( 1 ) , works in $$O(n)$$ O ( n ) space and, with high probability, in $$O(n \cdot \min \{m^k,\log ^k n\})$$ O ( n · min { m k , log k n } ) time. Our algorithm requires a careful adaptation of the k-errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop $$O(n^2)$$ O ( n 2 ) -time algorithms to compute all (k, m)-mappability tables for a fixed m and all $$k\in \{0,\ldots ,m\}$$ k ∈ { 0 , … , m } or a fixed k and all $$m\in \{k,\ldots ,n\}$$ m ∈ { k , … , n } . Finally, we show that, for $$k,m = \Theta (\log n)$$ k , m = Θ ( log n ) , the (k, m)-mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper presented at SPIRE 2018.

show abstract

Section: Solution Time Complexitymentioning

confidence: 99%

Efficient Computation of Sequence Mappability

et al. 2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this model, a decision needs to be made whether sufficiently similar factors only of length exactly m or of all lengths between m − k and m + k should be counted. The techniques presented recently in [23,9] may prove useful for counting. We also leave this problem for future investigation.…”

Section: Final Remarksmentioning

confidence: 99%

“…In [8] the authors introduced an efficient construction of a genome mappability array B k in which B k [μ] is the smallest length m such that at least μ of the length-m factors of x do not occur elsewhere in x with at most k mismatches. The construction algorithm was later improved in [9].…”

Section: Introductionmentioning

confidence: 99%

Faster algorithms for 1-mappability of a sequence

Alzamel

Charalampopoulos

Iliopoulos

et al. 2020

Theoretical Computer Science

Self Cite

View full text Add to dashboard Cite

In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k = 1. There exists an algorithm to solve this problem for k = 1 requiring time O(mn log n/ log log n) using space O(n). Here we present two new algorithms that require worst-case time O(mn) and O(n log n log log n), respectively, and space O(n), thus greatly improving the previous result. Moreover, we present another algorithm that requires average-case time and space O(n) for integer alphabets of size σ if m = (log σ n). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time O(kn) and space O(n) if m = (k log σ n), assuming that the letters are independent and uniformly distributed random variables. Finally, we provide an experimental evaluation of our average-case algorithm demonstrating its competitiveness to the state-of-the-art implementation.

show abstract

“…The LCS problem has also been studied under Hamming and edit distance. We refer the interested reader to [2,17,28,72,75,76] and references therein.…”

Section: Introductionmentioning

confidence: 99%

Dynamic and Internal Longest Common Substring

et al. 2020

Self Cite

View full text Add to dashboard Cite

Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an $$\mathcal {O}(n)$$ O ( n ) -time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to the fully dynamic LCS problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in $$\tilde{\mathcal {O}}(n^{2/3})$$ O ~ ( n 2 / 3 ) time, after $$\tilde{\mathcal {O}}(n)$$ O ~ ( n ) -time and space preprocessing. This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, the authors presented an $$\tilde{\mathcal {O}}(n)$$ O ~ ( n ) -sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in $$\tilde{\mathcal {O}}(1)$$ O ~ ( 1 ) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings; specifically, computing the longest palindrome and the Lyndon factorization of a string after a single edit operation. We develop dynamic sublinear-time algorithms for both of these problems as well. We also consider internal LCS queries, that is, queries in which we are to return an LCS of a pair of substrings of S and T. We show that answering such queries is hard in general and propose efficient data structures for several restricted cases.

show abstract

Longest Common Prefixes with k-Errors and Applications

Cited by 9 publications

References 39 publications

Efficient Computation of Sequence Mappability

Efficient Computation of Sequence Mappability

Faster algorithms for 1-mappability of a sequence

Dynamic and Internal Longest Common Substring

Contact Info

Product

Resources

About