2011
DOI: 10.1080/07391102.2011.10508594
|View full text |Cite
|
Sign up to set email alerts
|

Weighted Relative Entropy for Alignment-free Sequence Comparison Based on Markov Model

Abstract: In this paper, we introduce a probabilistic measure for computing the similarity between two biological sequences without alignment. The computation of the similarity measure is based on the Kullback-Leibler divergence of two constructed Markov models. We firstly validate the method on clustering nine chromosomes from three species. Secondly, we give the result of similarity search based on our new method. We lastly apply the measure to the construction of phylogenetic tree of 48 HEV genome sequences. Our resu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2011
2011
2023
2023

Publication Types

Select...
8

Relationship

2
6

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 52 publications
0
4
0
Order By: Relevance
“…These models were then used for grouping NGS reads into different bins, followed by extracting the k -tuples and calculating their expectation in each bin. Markov models have been used extensively for genome modeling (Narlikar et al, 2013), motif discovery (D’haeseleer, 2006), computational gene search (Lomsadze et al, 2005), classification of metagenomic sequences (Brady and Salzberg, 2009) and alignment-free sequence comparison (Chang and Wang, 2011). Next, we extended the definition of our previous alignment-free measures, d2S and d2*, to make them more compatible with a scheme of analysis that uses the proposed reads binning datasets.…”
Section: Introductionmentioning
confidence: 99%
“…These models were then used for grouping NGS reads into different bins, followed by extracting the k -tuples and calculating their expectation in each bin. Markov models have been used extensively for genome modeling (Narlikar et al, 2013), motif discovery (D’haeseleer, 2006), computational gene search (Lomsadze et al, 2005), classification of metagenomic sequences (Brady and Salzberg, 2009) and alignment-free sequence comparison (Chang and Wang, 2011). Next, we extended the definition of our previous alignment-free measures, d2S and d2*, to make them more compatible with a scheme of analysis that uses the proposed reads binning datasets.…”
Section: Introductionmentioning
confidence: 99%
“…A multitude of alignment-free sequence comparison algorithms have been developed in recent years, such as conditional Lempel-Ziv [4] and Kolmogorov complexity [5], measure representation [6], Markov model comparisons and frequent substring lengths [7,8], which divides the genome into regions that represent a system that is evolving over time with hidden states. Base-base correlation [9], spectral distortion [10], primitive discrimination substrings [11], Burrows-Wheeler similarity [12], normalized central moments, nearest-neighbor interactions [13], subword composition [14], prefix codes [15], information correlation [16], the context-object model [17], and spaced word frequencies [18].…”
Section: Literature Reviewmentioning
confidence: 99%
“…Many neutral mutations may remain and play a role of random background. One should subtract the random background from the simple counting result in order to highlight the contribution of selective evolution (Chang and Wang, 2011;Ding et al, 2010;Gao et al, 2006). In this work, we propose a new conditional multinomial distribution representation which reveals the relative difference of biological sequence from sequence generated by an independent random process to remove the random background.…”
Section: Complete Multinomial Composition Vectormentioning
confidence: 99%