k-Abelian Pattern Matching

Ehlers, Thorsten; Manea, Florín; Mercaş, Robert; Nowotka, Dirk

doi:10.1007/978-3-319-09698-8_16

Cited by 8 publications

(11 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 17 1 8 13 19 4 15 11 20 0 7 18 2 9 5 16 12 3 14 We observe that when identifying the q-gram distance between two blocks, we can apply the idea in [13], with the only difference that we should also maintain a Parikh vector that stores the differences between the number of occurrences of q-grams in the current block of xx and y (in fact the new letters given by the ranks). Moreover, at the time of the construction of y , we also construct a Parikh vector P(y ), storing, for each letter of y , the number of its occurrences in y .…”

Section: Algorithm Sacsc: An Exact Suffix-array-based Algorithmmentioning

confidence: 99%

“…Thus, Step 3 cannot guarantee that i best , the local minimum obtained by shifting the window m/β positions to the right and left of j best , is minimal for all 0 ≤ i < m. In this section, we give a fast and exact algorithm, denoted by saCSC, to find i such that δ i = D β,q (x i , y) is minimal, based on the suffix array (see Section 2). We partially follow the idea from [13]. This work investigates the string matching problem in the setting of k-abelian equivalences: two strings are considered k-abelian equivalent for some positive integer k, if they have the same length and share the same factors of length at most k, including multiplicities.…”

Section: Algorithm Sacsc: An Exact Suffix-array-based Algorithmmentioning

confidence: 99%

“…In [13], the authors propose a linear-time algorithm to solve the string matching problem when looking at q-abelian equivalent strings: given a string x of length m, a string y of length n ≥ m, and a positive integer q < m, all factors of y that are q-abelian equivalent to x can be found in time and space O(m + n). The idea of the algorithm in [13] consists in constructing the suffix array of the string xy, and ranking sets of identical q-length prefixes of suffixes in the suffix array in the order of their appearance. Then it constructs new strings based on this ranking, and solves the problem as in the jumbled matching case [6], i.e.…”

Section: Algorithm Sacsc: An Exact Suffix-array-based Algorithmmentioning

confidence: 99%

See 2 more Smart Citations

Circular Sequence Comparison with q-grams

Grossi

Iliopoulos

Mercaş

et al. 2015

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Sequence comparison is a fundamental step in many important tasks in bioinformatics. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular genome structure is a common phenomenon in nature, a caveat of specialized alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences. In this paper, we introduce a new distance measure based on q-grams, and show how it can be computed eciently for circular sequence comparison. Experimental results, using real and synthetic data, demonstrate orders of-magnitude superiority of our approach in terms of effciency, while maintaining an accuracy very competitive to the state of the art

show abstract

Section: Algorithm Sacsc: An Exact Suffix-array-based Algorithmmentioning

confidence: 99%

Section: Algorithm Sacsc: An Exact Suffix-array-based Algorithmmentioning

confidence: 99%

Section: Algorithm Sacsc: An Exact Suffix-array-based Algorithmmentioning

confidence: 99%

See 1 more Smart Citation

Circular Sequence Comparison with q-grams

Grossi

Iliopoulos

Mercaş

et al. 2015

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…✩ This work represents an extended version of a paper presented at the 18th International Conference on Developments in Language Theory, DLT 2014[5].…”

mentioning

confidence: 98%

k-Abelian pattern matching

Ehlers

Manea

Mercaş

et al. 2015

Journal of Discrete Algorithms

Self Cite

View full text Add to dashboard Cite

Two words are called k-abelian equivalent, if they share the same multiplicities for all factors of length at most k. We present an optimal linear time algorithm for identifying all occurrences of factors in a text that are k-abelian equivalent to some pattern P . Moreover, an optimal algorithm for finding the largest k for which two words are k-abelian equivalent is given. Solutions for online versions of the k-abelian pattern matching problem are also proposed.

show abstract

“…In k-abelian pattern matching, two words are considered equivalent if the subsequences or factors of length k occur in both words in the same multiplicity. Some algorithms for this problem, together with experimental results were presented in [91] and [92]. My contributition here was mostly the implementation of the developed algorithms and the experimental section.…”

Section: George Clooneymentioning

confidence: 99%

SAT and CP - Parallelisation and Applications

Ehlers¹

Self Cite

View full text Add to dashboard Cite

In this thesis, we consider the parallelisation and application of SAT and CP solvers. In the first chapter, we consider SAT, the decision problem of propositional logic. We discuss details of the implementations of SAT solvers, and show how to improve upon existing sequential solvers. Furthermore, we discuss the parallelisation of these solvers with a focus on the communication of intermediate results within a parallel solver. The second chapter is concerned with Contraint Programing (CP) with learning. Contrary to classical Constraint Programming techniques, this incorporates learning mechanisms as they are used in the field of SAT solving. We present results from parallelising CHUFFED, a learning CP solver. In the final chapter, we discuss Sorting Networks, which are data oblivious sorting algorithms. Their independence of the input data lends them to parallel implementation. We consider the question how many parallel sorting steps are needed to sort some inputs, and present both lower and upper bounds for several cases.

show abstract

k-Abelian Pattern Matching

Cited by 8 publications

References 16 publications

Circular Sequence Comparison with q-grams

Circular Sequence Comparison with q-grams

k-Abelian pattern matching

SAT and CP - Parallelisation and Applications

Contact Info

Product

Resources

About