2010
DOI: 10.1007/s11786-010-0033-6
|View full text |Cite
|
Sign up to set email alerts
|

Fast, Practical Algorithms for Computing All the Repeats in a String

Abstract: Given a string x = x[1..n] on an alphabet of size α, and a threshold p min ≥ 1, we describe four variants of an algorithm PSY1 that, using a suffix array, computes all the complete nonextendible repeats in x of length p ≥ p min . The basic algorithm PSY1-1 and its simple extension PSY1-2 are fast on strings that occur in biological, natural language and other applications (not highly periodic strings), while PSY1-3 guarantees (n) worst-case execution time. The final variant, PSY1-4, also achieves (n) processin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2012
2012
2018
2018

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 16 publications
0
5
0
Order By: Relevance
“…It turns out that RSF can be used to compute all the non-extendible repeating substrings in w. These data structures are important in bioinformatics applications; algorithms to compute them were described in [14,15,16] using suffix trees or suffix arrays. We introduce the inverse RSF array IRSF to compute all non-extendible repeating substrings in w.…”
Section: Computing Non-extendible Repeating Substrings In Strings Usimentioning
confidence: 99%
See 1 more Smart Citation
“…It turns out that RSF can be used to compute all the non-extendible repeating substrings in w. These data structures are important in bioinformatics applications; algorithms to compute them were described in [14,15,16] using suffix trees or suffix arrays. We introduce the inverse RSF array IRSF to compute all non-extendible repeating substrings in w.…”
Section: Computing Non-extendible Repeating Substrings In Strings Usimentioning
confidence: 99%
“…Consequently, we get Theorems 5.4 and 5.5. Note that there are other linear time algorithms proposed to compute non-extendible repeating substrings in a string that are more space efficient [16]. We present this algorithm to show the usefulness of the RSF data structure.…”
Section: Proofmentioning
confidence: 99%
“…We are using an algorithm by Puglisi et al . based on suffix arrays to identify all repeats with at least two tokens in a cloned fragment . On the basis of the repeats, we use the fractions of tokens of a fragment that are not covered by any repeat NR as a metric for repetitiveness.…”
Section: Our Approachmentioning
confidence: 99%
“…There are well-known algorithms for computing maximal repeats in linear time, using a data structure from the suffix family (like suffix tree or suffix array) [12] and Gusfield [7] outlines an algorithm to compute largest-maximal repeat. For our experiments we used in all cases a linear implementation using the suffix array which processes roughly 500K- …”
Section: Of Course ωLmr(n) Is Upper-bounded By O(n 2 ) It Is Howevermentioning
confidence: 99%