2005
DOI: 10.1007/11575832_20
|View full text |Cite
|
Sign up to set email alerts
|

Lossless Filter for Finding Long Multiple Approximate Repetitions Using a New Data Structure, the Bi-factor Array

Abstract: Abstract. Similarity search in texts, notably biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been created in order to speed up the resolution of the problem. However, previous filters were made for speeding up pattern matching, or for finding repetitions between two sequences or occurring twice in the same sequence. In this paper, we present an algorithm called NIMBUS for filtering sequences prior to finding repetitions occurring… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0
1

Year Published

2006
2006
2020
2020

Publication Types

Select...
3
3
2

Relationship

4
4

Authors

Journals

citations
Cited by 14 publications
(12 citation statements)
references
References 20 publications
0
9
0
1
Order By: Relevance
“…Recently, the same problem has been extensively studied under distance metrics; that is, the sought factors, one from x and one from y, must be at distance at most k and have maximal length. We refer the interested reader to [6][7][8][9][10][11] and to references therein.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, the same problem has been extensively studied under distance metrics; that is, the sought factors, one from x and one from y, must be at distance at most k and have maximal length. We refer the interested reader to [6][7][8][9][10][11] and to references therein.…”
Section: Introductionmentioning
confidence: 99%
“…We recently learned that Peterlongo et al [25] and Crochemore and Tischler [11] independently defined SSAs, under the names "bi-factor arrays" and "gapped suffix arrays", for the special case in which the spaced seed has the form 1 a 0 b 1 c . Russo and Tischler [26] showed how to represent such an SSA in asymptotically succinct space such that we can support random access to it in time logarithmic in the length of the text.…”
Section: Introductionmentioning
confidence: 99%
“…Indeed, many algorithms for efficiently computing string matches [4,5,6] or alignments [7,8,9,10,11,12,13] use k-factors. In particular, filtration algorithms that have been created for quickly discarding large portions of the input before applying a more expensive algorithm on the remaining data are often based on the identification of such short repeated words [14,15,16,17,18,19,20].…”
Section: Introductionmentioning
confidence: 99%
“…Among the exact filtration algorithms (exact in the sense that they discard only portions of the text that can not be part of the final solution sought), some consider motifs composed of non consecutive letters [15,16,17,19], or sets of kfactors [14,18,20]. Both present advantages for filtering purposes in comparison with single k-factors with no letters skipped as shown in [15,21,17].…”
Section: Introductionmentioning
confidence: 99%