Lossless Filter for Finding Long Multiple Approximate Repetitions Using a New Data Structure, the Bi-factor Array

Peterlongo, Pierre; Pisanti, Nadia; Boyer, Frédéric; Sagot, Marie‐France

doi:10.1007/11575832_20

Cited by 14 publications

(12 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Recently, the same problem has been extensively studied under distance metrics; that is, the sought factors, one from x and one from y, must be at distance at most k and have maximal length. We refer the interested reader to [6][7][8][9][10][11] and to references therein.…”

Section: Introductionmentioning

confidence: 99%

Longest property-preserved common factor: A new string-processing framework

Ayad

Bernardini

Grossi

et al. 2020

Theoretical Computer Science

Self Cite

View full text Add to dashboard Cite

a r t i c l e i n f o a b s t r a c tWe introduce a new family of string processing problems. Given two or more strings, we are asked to compute a factor common to all strings that preserves a specific property and has maximal length. We consider three fundamental string properties: square-free factors, periodic factors, and palindromic factors under three different settings, one per property. In the first setting, we are given a string x and we are asked to construct a data structure over x answering the following type of online queries: given a string y, find a longest squarefree factor common to x and y. In the second setting, we are given k strings and an integer 1 < k ≤ k and we are asked to find a longest periodic factor common to at least k strings.In the third one, we are given two strings and we are asked to find a longest palindromic factor common to the two strings. We present linear-time solutions for all settings. This is a full and extended version of a paper from SPIRE 2018.

show abstract

Section: Introductionmentioning

confidence: 99%

Longest property-preserved common factor: A new string-processing framework

Ayad

Bernardini

Grossi

et al. 2020

Theoretical Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…We recently learned that Peterlongo et al [25] and Crochemore and Tischler [11] independently defined SSAs, under the names "bi-factor arrays" and "gapped suffix arrays", for the special case in which the spaced seed has the form 1 a 0 b 1 c . Russo and Tischler [26] showed how to represent such an SSA in asymptotically succinct space such that we can support random access to it in time logarithmic in the length of the text.…”

Section: Introductionmentioning

confidence: 99%

Compressed Spaced Suffix Arrays

2017

View full text Add to dashboard Cite

Abstract. Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data structure, either a hash table or a spaced suffix array (SSA). In this paper we show how to compress SSAs relative to normal suffix arrays (SAs) and still support fast random access to them. We first prove a theoretical upper bound on the space needed to store an SSA when we already have the SA. We then present experiments indicating that our approach works even better in practice.

show abstract

“…Indeed, many algorithms for efficiently computing string matches [4,5,6] or alignments [7,8,9,10,11,12,13] use k-factors. In particular, filtration algorithms that have been created for quickly discarding large portions of the input before applying a more expensive algorithm on the remaining data are often based on the identification of such short repeated words [14,15,16,17,18,19,20].…”

Section: Introductionmentioning

confidence: 99%

“…Among the exact filtration algorithms (exact in the sense that they discard only portions of the text that can not be part of the final solution sought), some consider motifs composed of non consecutive letters [15,16,17,19], or sets of kfactors [14,18,20]. Both present advantages for filtering purposes in comparison with single k-factors with no letters skipped as shown in [15,21,17].…”

Section: Introductionmentioning

confidence: 99%

Indexing Gapped-Factors Using a Tree

Peterlongo

Allali

Sagot

2008

Int. J. Found. Comput. Sci.

Self Cite

View full text Add to dashboard Cite

We present a data structure to index a specific kind of factors, that is of substrings, called gapped-factors. A gapped-factor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gapped-factors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration.

show abstract

Lossless Filter for Finding Long Multiple Approximate Repetitions Using a New Data Structure, the Bi-factor Array

Cited by 14 publications

References 20 publications

Longest property-preserved common factor: A new string-processing framework

Longest property-preserved common factor: A new string-processing framework

Compressed Spaced Suffix Arrays

Indexing Gapped-Factors Using a Tree

Contact Info

Product

Resources

About