Computation of Repetitions and Regularities of Biologically Weighted Sequences

Christodoulakis, Manolis; Iliopoulos, Costas S.; Mouchard, Laurent; Perdikuri, Katerina; Tsakalidis, Athanasios K.; Tsichlas, Kostas

doi:10.1089/cmb.2006.13.1214

Cited by 25 publications

(15 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There have been published works in the scientific literature [19,5,6,54] concerning the processing of string sequences; we will refer to these works giving more emphasis to the structure presented in [54]. In [19], a set of efficient algorithms were presented for string problems developing in the computational biology area.…”

Section: Index Structures For Weighted Stringsmentioning

confidence: 99%

“…In [19], a set of efficient algorithms were presented for string problems developing in the computational biology area. In particular, assume that we deal with a weighted sequence X of length n and with a pattern p of length m, then (i) the occurrences of p in X can be located in O((n + m) log m) time and linear space; the solution works for both the multiplicative and the average model of probability estimation, although it can be extended also to handle the appearance of gaps; (ii) the set of repetitions and the set of covers (of length m) in the weighted sequence can be computed in O(n log m) time.…”

Section: Index Structures For Weighted Stringsmentioning

confidence: 99%

See 1 more Smart Citation

String Data Structures for Computational Molecular Biology

Makris

Theodoridis

2010

Algorithms in Computational Molecular Biology

View full text Add to dashboard Cite

Section: Index Structures For Weighted Stringsmentioning

confidence: 99%

Section: Index Structures For Weighted Stringsmentioning

confidence: 99%

String Data Structures for Computational Molecular Biology

Makris

Theodoridis

2010

Algorithms in Computational Molecular Biology

View full text Add to dashboard Cite

“…A great deal of research has been conducted on weighted strings for pattern matching [3,4], for computing various types of regularities [5,6,7,8], for indexing [3,9], and for alignments [10,11]. The efficiency of most of the proposed algorithms relies on the assumption of a given constant cumulative weight threshold defining the minimal probability of occurrence of factors in the weighted string.…”

Section: Introductionmentioning

confidence: 99%

Linear-time computation of prefix table for weighted strings & applications

Barton

Liu

Pissis

2016

Theoretical Computer Science

View full text Add to dashboard Cite

The prefix table of a string is one of the most fundamental data structures of algorithms on strings: it determines the longest factor at each position of the string that matches a prefix of the string. It can be computed in time linear with respect to the size of the string, and hence it can be used efficiently for locating patterns or for regularity searching in strings. A weighted string is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices or uncertain strings, naturally arise in many biological contexts; for example, they provide a method to realise approximation among occurrences of the same DNA segment. In this article, given a weighted string x of length n and a constant cumulative weight threshold 1/z, defined as the minimal probability of occurrence of factors in x, we present an O(n)-time algorithm for computing the prefix table of x. Furthermore, we outline a number of applications of this result for solving various problems on non-standard strings, and present some preliminary experimental results.

show abstract

“…Weighted sequences are also used to represent relatively short sequences such as binding sites, as well as long sequences such as protein families profiles [3]. Additionally they have been used to represent complete chromosome sequences that were obtained using the traditional method of whole-genome shotgun strategy [3].…”

Section: Introductionmentioning

confidence: 99%

“…Additionally they have been used to represent complete chromosome sequences that were obtained using the traditional method of whole-genome shotgun strategy [3].…”

Section: Introductionmentioning

confidence: 99%

Practical and Efficient Algorithms for Degenerate and Weighted Sequences Derived from High Throughput Sequencing Technologies

Antoniou

Iliopoulos

Mouchard

et al. 2009

2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing

View full text Add to dashboard Cite

High throughput, (or next generation) sequencing technologies have opened new and exciting opportunities in the use of DNA sequences. The new emerging technologies mark the beginning of a new era of high throughput short read sequencing: they have the potential to assemble a bacterial genome during a single experiment and at a moderate cost. In this paper, we address the problem of efficiently mapping millions of degenerate and weighted sequences to a reference genome with respect to whether they occur exactly once in the genome or not, and by taking probability scores into consideration. In particular, we define and solve the Massive Exact and Approximate Unique Pattern Matching problem for degenerate and weighted sequences derived from high throughput sequencing technologies.

show abstract

Computation of Repetitions and Regularities of Biologically Weighted Sequences

Cited by 25 publications

References 17 publications

String Data Structures for Computational Molecular Biology

String Data Structures for Computational Molecular Biology

Linear-time computation of prefix table for weighted strings & applications

Practical and Efficient Algorithms for Degenerate and Weighted Sequences Derived from High Throughput Sequencing Technologies

Contact Info

Product

Resources

About