1994
DOI: 10.1089/cmb.1994.1.199
|View full text |Cite
|
Sign up to set email alerts
|

Biological Evaluation of d2, an Algorithm for High-Performance Sequence Comparison

Abstract: A number of algorithms exist for searching sequence databases for biologically significant similarities based on the primary sequence similarity of aligned sequences. We have determined the biological sensitivity and selectivity of d2, a high-performance comparison algorithm that rapidly determines the relative dissimilarity of large datasets of genetic sequences. d2 uses sequence-word multiplicity as a simple measure of dissimilarity. It is not constrained by the comparison of direct sequence alignments and s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
42
0
1

Year Published

1998
1998
2017
2017

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 62 publications
(43 citation statements)
references
References 9 publications
0
42
0
1
Order By: Relevance
“…Vector artifacts and repetitive sequences were masked using CrossMatch (http://www.phrap.org). Masked sequences were clustered based on their relative similarity (.96% identity over a window of 150 nucleotides) using a word-based greedy clustering algorithm (Hide et al, 1994). Loose but related clusters were further aligned and assembled using PHRAP (http://www.…”
Section: Sequence Clusteringmentioning
confidence: 99%
“…Vector artifacts and repetitive sequences were masked using CrossMatch (http://www.phrap.org). Masked sequences were clustered based on their relative similarity (.96% identity over a window of 150 nucleotides) using a word-based greedy clustering algorithm (Hide et al, 1994). Loose but related clusters were further aligned and assembled using PHRAP (http://www.…”
Section: Sequence Clusteringmentioning
confidence: 99%
“…The only criterion for clustering is sequence overlap and source or annotation information is not used. To detect the overlap criterion, we use the d2 algorithm and set parameters and threshold values as described in previous work (Torney et al 1990;Hide et al 1994;Wu et al 1997). The initial and final state of the algorithm is a partition of the input sequences in which each sequence is in a cluster and no sequence appears in more than one cluster.…”
Section: Description Of the D2_cluster Methods (D20)mentioning
confidence: 99%
“…The notation d2(A,B) is conveniently used, but, of course, d2(,) is not a function of only A and B but also of various parameters (specified in Torney et al 1990;Hide et al 1994;Wu et al 1997). The MERGE operation can be expressed in terms of convention 4 above: For all sequences, Sr, such that Cr = j, Cr is reset to be Cr = i.…”
Section: Description Of the D2_cluster Methods (D20)mentioning
confidence: 99%
“…1). Each UniGene cluster was subjected to a further clustering by d2 cluster (Hide et al 1994;J. Burke, D. Davison, and W. Hide, in prep.).…”
Section: Methodsmentioning
confidence: 99%
“…The sequence annotations were used to extract the 3Ј UTRs, and known repetitive elements were filtered out of the data set of UTRs. For this analysis the UniGene clustering was not used; we reclustered the data set using d2 cluster (Hide et al 1994;J. Burke, D. Davison, and W. Hide, in prep.…”
Section: Figurementioning
confidence: 99%