Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Sequence mappability is an important task in genome resequencing. In the (k, m)-mappability problem, for a given sequence T of length n, the goal is to compute a table whose ith entry is the number of indices $$j \ne i$$ j ≠ i such that the length-m substrings of T starting at positions i and j have at most k mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of $$k=1$$ k = 1 . We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for $$k=O(1)$$ k = O ( 1 ) , works in $$O(n)$$ O ( n ) space and, with high probability, in $$O(n \cdot \min \{m^k,\log ^k n\})$$ O ( n · min { m k , log k n } ) time. Our algorithm requires a careful adaptation of the k-errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop $$O(n^2)$$ O ( n 2 ) -time algorithms to compute all (k, m)-mappability tables for a fixed m and all $$k\in \{0,\ldots ,m\}$$ k ∈ { 0 , … , m } or a fixed k and all $$m\in \{k,\ldots ,n\}$$ m ∈ { k , … , n } . Finally, we show that, for $$k,m = \Theta (\log n)$$ k , m = Θ ( log n ) , the (k, m)-mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper presented at SPIRE 2018.
Sequence mappability is an important task in genome resequencing. In the (k, m)-mappability problem, for a given sequence T of length n, the goal is to compute a table whose ith entry is the number of indices $$j \ne i$$ j ≠ i such that the length-m substrings of T starting at positions i and j have at most k mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of $$k=1$$ k = 1 . We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for $$k=O(1)$$ k = O ( 1 ) , works in $$O(n)$$ O ( n ) space and, with high probability, in $$O(n \cdot \min \{m^k,\log ^k n\})$$ O ( n · min { m k , log k n } ) time. Our algorithm requires a careful adaptation of the k-errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop $$O(n^2)$$ O ( n 2 ) -time algorithms to compute all (k, m)-mappability tables for a fixed m and all $$k\in \{0,\ldots ,m\}$$ k ∈ { 0 , … , m } or a fixed k and all $$m\in \{k,\ldots ,n\}$$ m ∈ { k , … , n } . Finally, we show that, for $$k,m = \Theta (\log n)$$ k , m = Θ ( log n ) , the (k, m)-mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper presented at SPIRE 2018.
Sequence mappability is an important task in genome re-sequencing. In the (k, m)-mappability problem, for a given sequence T of length n, our goal is to compute a table whose ith entry is the number of indices j = i such that length-m substrings of T starting at positions i and j have at most k mismatches. Previous works on this problem focused on heuristic approaches to compute a rough approximation of the result or on the case of k = 1. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that works in O(n min{m k , log k+1 n}) time and O(n) space for k = O(1). It requires a careful adaptation of the technique of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. We also show O(n 2 )-time algorithms to compute all results for a fixed m and all k = 0, . . . , m or a fixed k and all m = k, . . . , n − 1. Finally we show that the (k, m)-mappability problem cannot be solved in strongly subquadratic time for k, m = Θ(log n) unless the Strong Exponential Time Hypothesis fails.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.