2009
DOI: 10.1186/1748-7188-4-3
|View full text |Cite
|
Sign up to set email alerts
|

Lossless filter for multiple repeats with bounded edit distance

Abstract: Background: Identifying local similarity between two or more sequences, or identifying repeats occurring at least twice in a sequence, is an essential part in the analysis of biological sequences and of their phylogenetic relationship. Finding such fragments while allowing for a certain number of insertions, deletions, and substitutions, is however known to be a computationally expensive task, and consequently exact methods can usually not be applied in practice.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0
1

Year Published

2010
2010
2018
2018

Publication Types

Select...
5
2
1

Relationship

4
4

Authors

Journals

citations
Cited by 16 publications
(20 citation statements)
references
References 22 publications
0
19
0
1
Order By: Relevance
“…For future work, we will explore the possibility of optimising our algorithms and the corresponding library implementation for the approximate case by using lossless filters for eliminating a possibly large fraction of the input that is guaranteed not to contain any approximate occurrence, such as [ 31 ] for the Hamming distance model or [ 32 ] for the edit distance model. In addition, we will try to improve our algorithms for the approximate case in order to achieve average-case optimality.…”
Section: Discussionmentioning
confidence: 99%
“…For future work, we will explore the possibility of optimising our algorithms and the corresponding library implementation for the approximate case by using lossless filters for eliminating a possibly large fraction of the input that is guaranteed not to contain any approximate occurrence, such as [ 31 ] for the Hamming distance model or [ 32 ] for the edit distance model. In addition, we will try to improve our algorithms for the approximate case in order to achieve average-case optimality.…”
Section: Discussionmentioning
confidence: 99%
“…The literature of algorithmic approaches and software tools for finding motifs and repetitions is vast, as the variability of the problem formulations leads to a variability of algorithmic strategies, and often to combinations of them. For finding long repetitions [39], for example, a preprocessing with an efficient and effective filtering [24,23,28] turns out to be the only possible combinatorial approach. For short motifs there are several enumerative pattern-driven algorithms [19,30,31,34].…”
Section: Related Workmentioning
confidence: 99%
“…An exception to this are the q-gram filtering techniques [32] that have successfully been used for string matching under the edit distance model (e.g. [7,30,26]), as well as for multiple local alignments both under the Hamming [27] and edit [26] distance model.…”
Section: Introductionmentioning
confidence: 99%
“…We introduce the β-blockwise q-gram distance between two strings x and y, that is, a more powerful generalization of the q-gram distance introduced as a string distance measure in [32]. Intuitively, and similarly to [7,30,26], this generalization comprises partitioning x and y in β blocks each, as evenly as possible, computing the q-gram distance between the corresponding block pairs, and then summing up the distances computed blockwise. 2.…”
Section: Introductionmentioning
confidence: 99%