2009
DOI: 10.1089/cmb.2009.0198
|View full text |Cite
|
Sign up to set email alerts
|

Alignment-Free Sequence Comparison (I): Statistics and Power

Abstract: Large-scale comparison of the similarities between two biological sequences is a major issue in computational biology; a fast method, the D2 statistic, relies on the comparison of the k-tuple content for both sequences. Although it has been known for some years that the D2 statistic is not suitable for this task, as it tends to be dominated by single-sequence noise, to date no suitable adjustments have been proposed. In this article, we suggest two new variants of the D2 word count statistic, which we call D2… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
259
1

Year Published

2011
2011
2022
2022

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 197 publications
(280 citation statements)
references
References 18 publications
6
259
1
Order By: Relevance
“…Such techniques were proposed as a fast alternative to much more time-consuming alignment methods, but at the expense of accuracy. Some detailed reviews of k-mer algorithms for sequence comparison (as well as others approaches based on information theory) were presented by Vinga et al [17], Reinert et al [16] and Wan et al [18]. The main idea of using k-mers in sequences comparison usually boils down to two stages.…”
Section: The Use Of K-mer In Biological Sequence Comparisonmentioning
confidence: 99%
“…Such techniques were proposed as a fast alternative to much more time-consuming alignment methods, but at the expense of accuracy. Some detailed reviews of k-mer algorithms for sequence comparison (as well as others approaches based on information theory) were presented by Vinga et al [17], Reinert et al [16] and Wan et al [18]. The main idea of using k-mers in sequences comparison usually boils down to two stages.…”
Section: The Use Of K-mer In Biological Sequence Comparisonmentioning
confidence: 99%
“…To this end, theoretical aspects of k-mer statistics for biological sequence comparison have been studied in detail before. [7][8][9][10] Previous work has shown a similar approach of studying optimal length of peptides shared among close homologues for ultra-fast protein searches. 11 As also shown previously, simple oligonucleotide or peptide overlap measures between two genomes can be indicative of their phylogenetic distance.…”
Section: Determination Of Distance Measurementioning
confidence: 99%
“…In all calculations, q ab was taken from the background probabilities of the blosum62 matrix at the reblosum web page [7]. Although other generalisations of the D 2 statistic exist, such as D * 2 and D S 2 proposed by Reinert et al [8], as far as we are aware, D …”
Section: Definitionsmentioning
confidence: 99%