2003
DOI: 10.1021/ci025591m
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of Similarity Measures for Searching the Dictionary of Natural Products Database

Abstract: Similarity searches using combinations of seven different similarity coefficients and six different representations have been carried out on the Dictionary of Natural Products database. The objective was to discover if any special methods of searching apply to this database, which is very different in nature from the many synthetic databases that have been the subject of previous studies of similarity searching. Search effectiveness was assessed by a recall analysis of the search outputs from sets of pharmacol… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
73
0

Year Published

2004
2004
2012
2012

Publication Types

Select...
7
2

Relationship

2
7

Authors

Journals

citations
Cited by 59 publications
(76 citation statements)
references
References 26 publications
(58 reference statements)
3
73
0
Order By: Relevance
“…The appropriate form of the Dice coefficient would be (A.3) which returns a value of 8/6, i.e., a spurious result greater than perfect similarity while equation (1) gives a value of 1. Although a working program can be achieved using the Dice coefficient in this context (earlier versions of our routine used it), previous experience has shown that it can be unwise to use a coefficient that returns similarity values outside of the prescribed range (Whittle, Willett, Klaffke, & van-Noort, 2003). It is in any case mathematically uncomfortable, and we therefore use the alternative formulation as described.…”
Section: Discussionmentioning
confidence: 99%
“…The appropriate form of the Dice coefficient would be (A.3) which returns a value of 8/6, i.e., a spurious result greater than perfect similarity while equation (1) gives a value of 1. Although a working program can be achieved using the Dice coefficient in this context (earlier versions of our routine used it), previous experience has shown that it can be unwise to use a coefficient that returns similarity values outside of the prescribed range (Whittle, Willett, Klaffke, & van-Noort, 2003). It is in any case mathematically uncomfortable, and we therefore use the alternative formulation as described.…”
Section: Discussionmentioning
confidence: 99%
“…Various molecular representations in combination with different similarity coefficients that are usually used for similarity searching in synthetic compounds databases have been assessed for their effectiveness in searching databases of natural products [254]. In that study, the Russell-Rao coefficient and Unity fingerprints appeared to be the best combination for large molecules.…”
Section: Natural Products and Diversity-oriented Synthesismentioning
confidence: 99%
“…In addition, useful data fusion approaches could potentially result in a merged result that is superior to any of the individual input method results and thus is able to extract useful information from all input lists, including inferior methods. Among the published fusion methods, the Sumrank method has been shown to be among the most successful [145][146][147][148][149].…”
Section: Virtual Hit-lists: Data-fusion Vs Data-aggregationmentioning
confidence: 99%