2018
DOI: 10.1016/j.ic.2018.06.002
|View full text |Cite
|
Sign up to set email alerts
|

Alignment-free sequence comparison using absent words

Abstract: Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as q-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the compleme… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
22
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
1

Relationship

4
4

Authors

Journals

citations
Cited by 27 publications
(22 citation statements)
references
References 37 publications
(57 reference statements)
0
22
0
Order By: Relevance
“…A tight upper bound on the number of MAWs of a word y of length n over an alphabet of size σ is known to be O(σ n) [13,22,7]. It was also shown that the set of all MAWs of y is sufficient to uniquely reconstruct y [13,15].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…A tight upper bound on the number of MAWs of a word y of length n over an alphabet of size σ is known to be O(σ n) [13,22,7]. It was also shown that the set of all MAWs of y is sufficient to uniquely reconstruct y [13,15].…”
Section: Introductionmentioning
confidence: 99%
“…This problem can be viewed as a variant of the classic approximate pattern-matching problem in which the distance of the pattern of length m to a factor of length m of the text is the LWI distance. Note that LWI verifies metric conditions [7]. The problem of approximate pattern matching admits many different formulations and has been the subject of many works (see [18,11,24]).…”
Section: Introductionmentioning
confidence: 99%
“…The set of all minimal absent words of length at most of a word y is denoted by M y . For example, if y = abaab, then M y = {aaa, aaba, bab, bb} and M 3 y = {aaa, bab, bb}. The upper bound on the number of minimal absent words is O(σ n) [2], where σ is the size of the alphabet and n is the length of y, and this bound is tight for integer alphabets [3]; in fact, for large alphabets, such as when σ ≥ √ n, this bound is tight even for minimal absent words having the same length [4,5].…”
Section: Introductionmentioning
confidence: 99%
“…There also exist space-efficient data structures based on the Burrows-Wheeler transform of y that can be applied for this computation [10,11]. In many real-world applications of minimal absent words, such as in data compression [12][13][14][15], in sequence comparison [3,9], in on-line pattern matching [16], or in identifying pathogen-specific signatures [17], only a subset of minimal absent words may be considered, and, in particular, the minimal absent words of length (at most) . Since, in the worst case, the number of minimal absent words of y is Θ(σ n), Ω(σ n) space is required to represent them explicitly.…”
Section: Introductionmentioning
confidence: 99%
“…For example, if y = abaab, then M y = {aaa, aaba, bab, bb} and M 3 y = {aaa, bab, bb}. The upper bound on the number of minimal absent words is O(σn) [10], where σ is the size of the alphabet and n is the length of y, and this is tight for integer alphabets [6]; in fact, for large alphabets, such as when σ ≥ √ n, this bound is also tight even for minimal absent words having the same length [1].…”
Section: Introductionmentioning
confidence: 99%