2015
DOI: 10.1186/s13015-015-0032-x
|View full text |Cite
|
Sign up to set email alerts
|

Estimating evolutionary distances between genomic sequences from spaced-word matches

Abstract: Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator dN of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
81
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 54 publications
(81 citation statements)
references
References 53 publications
0
81
0
Order By: Relevance
“…To describe our algorithm, we are using the terminology from our previous papers (Leimeister and Morgenstern, 2014; Morgenstern et al , 2015). For an alphabet Σ, a sequence S of length L and 0<iL,S[i] denotes the i th symbol of S .…”
Section: Algorithmmentioning
confidence: 99%
“…To describe our algorithm, we are using the terminology from our previous papers (Leimeister and Morgenstern, 2014; Morgenstern et al , 2015). For an alphabet Σ, a sequence S of length L and 0<iL,S[i] denotes the i th symbol of S .…”
Section: Algorithmmentioning
confidence: 99%
“…Spacedword matches or spaced seeds have been introduced in database searching as an alternative to exact k-mer matches [8,34,31]. The main advantage of spaced words compared to contiguous k-mers is the fact that results based on spaced words are statistically more stable than results based on k-mers [24,13,10,9,36,38]. Quite obviously, approximations (4) and (5) remain valid if we define X k to be the number of spaced-word matches for a given pattern P of weight k, and we can generalize the definition of F (k) accordingly: if we consider a maximum pattern weight K and a given set of patterns {P k , 1 ≤ k ≤ K} where k is the weight of pattern P k , then we can define N k as the empirical number of spaced-word matches with respect to pattern P k between two observed DNA sequences.…”
Section: The Number Of Spaced-word Matchesmentioning
confidence: 99%
“…Skmer [47] is a further improvement of this approach. In a previous paper, we proposed another way to infer evolutionary distances between DNA sequences based on the number of word matches between them, and we generalized this to so-called spaced-word matches [36]. Here, a spaced-word match is a pair of words from two sequences that are identical at certain positions, specified by a pre-defined binary pattern of match and don't-care positions.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations