1996
DOI: 10.1002/qsar.19960150402
|View full text |Cite
|
Sign up to set email alerts
|

Parameter Based Methods for Compound Selection from Chemical Databases

Abstract: Two aigorithms for the selection of subsets of compounds from chemical databases are presented and discussed. The first is designed to select representative subsets whilst the second is intended to select compounds which cover the available property space. Both make use of calculated physicochemical parameters in contrast to more common methods based on molecular fingerprints. This is an approach to molecular similarity which has proved successful in the past. The methods are illustrated with examples and disc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
77
0

Year Published

1999
1999
2012
2012

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 111 publications
(77 citation statements)
references
References 10 publications
0
77
0
Order By: Relevance
“…WOMBAT (Word of Molecular BioAcTivity) 16 20 where subsets of 5% and 10% representative compounds were extracted using two alternative algorithms: MDC (most descriptive compound) 21 and LMD (longest minimum distance). 22 In both cases, the selection was carried out in the PCA t-scores space using 2 PC.…”
Section: Datamentioning
confidence: 99%
“…WOMBAT (Word of Molecular BioAcTivity) 16 20 where subsets of 5% and 10% representative compounds were extracted using two alternative algorithms: MDC (most descriptive compound) 21 and LMD (longest minimum distance). 22 In both cases, the selection was carried out in the PCA t-scores space using 2 PC.…”
Section: Datamentioning
confidence: 99%
“…We have already noted that the identification of the n most diverse molecules in a dataset containing N molecules is generally infeasible for non-trivial values of n and N (but see Section 4 below for an exception to this general rule), and practicable approaches to dissimilarity-based compound selection hence involve approximate methods that are not guaranteed to result in the identification of the most dissimilar possible subset (see, e.g., Bawden, 1993;Clark, 1997;Hudson et al, 1996;Lajiness, 1990, Marengo andTodeschini, 1992;Nilakantan et al, 1997;Pickett et al, 1998;Polinsky et al, 1996); that said, there is evidence to suggest that the subsets identified are only marginally sub-5 optimal (Gillet et al, 1997). Thus far, two major classes of algorithm have been described: maximum-dissimilarity algorithms and sphere-exclusion algorithms (Snarey et al, 1998) The basic maximum-dissimilarity algorithm for selecting a size-nSubset from a size-NDataset is shown in Figure 1.…”
Section: Selection Of Compounds From a Databasementioning
confidence: 99%
“…The inclusion of such a threshold results in a maximum dissimilarity algorithm that is not too far removed from the basic sphere-exclusion approach described by Hudson et al (1996). Here, a threshold t is set, which can be thought of as the radius of a hypersphere in multi-dimensional chemistry space.…”
Section: Insert Figure 1 About Herementioning
confidence: 99%
“…An alternative, sphere-exclusion approach involves selecting an initial molecule and then excluding from further consideration all molecules that have a similarity greater than some threshold with the chosen molecule. In subsequent stages, that non-excluded molecule is chosen for inclusion in the subset that has the largest dissimilarity to those molecules that have already been selected, and further molecules excluded if they are nearest neighbours of the one that has been chosen [85] (other approaches have also been described [86]). These approaches involve the identification of the most dissimilar molecule at each stage, and different results can be obtained depending on how 'most dissimilar' is defined: the MaxMin approach is widely used, and involves selecting that molecule for inclusion that has the maximum dissimilarity to its nearest neighbour in the current subset of selected molecules [87].…”
Section: Molecular Diversity Analysismentioning
confidence: 99%