2005
DOI: 10.1093/bioinformatics/bti658
|View full text |Cite
|
Sign up to set email alerts
|

Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences

Abstract: The algorithm SK-LD, estimate beta and simulation software are implemented in MATLAB code, and are available at http://www.stat.ncku.edu.tw/tjwu

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
42
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 54 publications
(43 citation statements)
references
References 23 publications
0
42
0
Order By: Relevance
“…A pairwise complete genome alignment routinely takes 2 h; thus, a 20-s MUMi calculation to preselect appropriate genomes should prove to be a convenient tool. A similar preselection strategy is used in large-scale BLAST alignments of proteins or genes and is based on an estimation of word dissimilarity (32,33). MUMi will also be valuable for fine tuning the parameters of software used for such alignments, e.g., MGA (17), MAUVE (6), or M-GCAT (30).…”
Section: Discussionmentioning
confidence: 99%
“…A pairwise complete genome alignment routinely takes 2 h; thus, a 20-s MUMi calculation to preselect appropriate genomes should prove to be a convenient tool. A similar preselection strategy is used in large-scale BLAST alignments of proteins or genes and is based on an estimation of word dissimilarity (32,33). MUMi will also be valuable for fine tuning the parameters of software used for such alignments, e.g., MGA (17), MAUVE (6), or M-GCAT (30).…”
Section: Discussionmentioning
confidence: 99%
“…Note, for comparison of large chromosomes we have used a simplified 2-letter alphabet. The block method is similar to that described by Wu et al (17). When sequences a and b are compared, each sequence is divided into m length blocks.…”
Section: Removal Of High Frequency and Low Complexity Featuresmentioning
confidence: 99%
“…Furthermore, let γ(s, q) be the required CPU In this section, the D2 value between the encoded library sequence s with the length 50 50,000 N   and query with the length 300 and word size k = 3 are computed to examine the efficieny of the proposed approach by capturing total execution time ∆. One common approach to find the dissimilarity between k-tuples in D2 statistics is to take the minimum of all window distances for each pair ( ( ), ( )) W W L W Q  , where () WL and () WQ are the k-tuples in library sequence and query sequence respectively [19]. In Fig.…”
Section: A D2 String Comparison Vs Proposed Approach B Efficiency mentioning
confidence: 99%