2002
DOI: 10.1198/004017002317375064
|View full text |Cite
|
Sign up to set email alerts
|

A Modification of the Jaccard–Tanimoto Similarity Index for Diverse Selection of Chemical Compounds Using Binary Strings

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
122
0
1

Year Published

2003
2003
2013
2013

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 123 publications
(129 citation statements)
references
References 8 publications
4
122
0
1
Order By: Relevance
“…Although the Tanimoto coefficient is widely used, Flower has noted that it typically yields low similarity values when the reference molecule in a similarity search has just a few bits set in its fingerprint [52]. This marked sizedependency was confirmed in later studies [53][54][55], and it has also been shown that the coefficient has an inherent bias towards certain similarity values [37]. These observations were the starting point for our comparative studies of similarity coefficients, as described below.…”
Section: Comparison Of Similarity Coefficientsmentioning
confidence: 96%
“…Although the Tanimoto coefficient is widely used, Flower has noted that it typically yields low similarity values when the reference molecule in a similarity search has just a few bits set in its fingerprint [52]. This marked sizedependency was confirmed in later studies [53][54][55], and it has also been shown that the coefficient has an inherent bias towards certain similarity values [37]. These observations were the starting point for our comparative studies of similarity coefficients, as described below.…”
Section: Comparison Of Similarity Coefficientsmentioning
confidence: 96%
“…However, the Tanimoto coefficient has a known bias toward the retrieval of simplistic structures in diversity-based selection procedures. 39 This is because for any binary fingerprint there may exist more than one (specifically, 2 nÀa ) distinct, and maximally dissimilar, fingerprint. This effect can be mitigated by considering, in addition to the common presence of features, f, the common absence thereof.…”
Section: Articlementioning
confidence: 99%
“…This effect can be mitigated by considering, in addition to the common presence of features, f, the common absence thereof. 39 The modified Tanimoto system with this functionality devised by Fligner et al 39 is referred to here as the binary modified Tanimoto with weighting, MTW bin :…”
Section: Articlementioning
confidence: 99%
“…The machine-derived, or spectral, descriptors are typically obtained in a combinatorial way, by indexing all of the possible labeled subgraphs of a given kind (e.g., paths, trees) and size (e.g., depth up to 3) of the molecular graphs, yielding the well-known fingerprint vector representations. [3][4][5][6][7][8][9] The somewhat parallel expansions in data and representations create new opportunities and challenges and require the continued development of methods to store and search molecules in chemical databases in ways that can scale up with these expansions. One strategy 10 to speed up database searches based on molecular similarity is essentially a pruning strategy with two basic components.…”
Section: Introductionmentioning
confidence: 99%