2004
DOI: 10.1007/s10822-004-4060-8
|View full text |Cite
|
Sign up to set email alerts
|

An automated PLS search for biologically relevant QSAR descriptors

Abstract: An automated PLS engine, WB-PLS, was applied to 1632 QSAR series with at least 25 compounds per series extracted from WOMBAT (WOrld of Molecular BioAcTivity). WB-PLS extracts a single Y variable per series, as well as pre-computed X variables from a table. The table contained 2D descriptors, the drug-like MDL 320 keys as implemented in the Mesa A&C Fingerprint module, and in-house generated topological-pharmacophore SMARTS counts and fingerprints. Each descriptor type was treated as a block, with or without sc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
56
0

Year Published

2007
2007
2016
2016

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 85 publications
(63 citation statements)
references
References 42 publications
6
56
0
Order By: Relevance
“…They concluded that occurrence-based representations were slightly, but significantly, superior to incidence-based representations; however, the experiments were on a very small scale with the datasets only containing 20-129 structures. Property prediction experiments using small QSAR and QSPR datasets were also reported by Olah et al [26] and by Azencott et al [27], both of whom again found that occurrence-based representations performed better than the corresponding incidence-based representations. A preference for occurrence-based representations was observed by Chen and Reynolds in simulated virtual screening experiments using the NCI AIDS and MDL Drug Data Report (MDDR) databases [28], although they noted that the highly specific fragment definitions that were employed (atom-pairs and atom-sequences) meant that there was often little difference between the two types of representation.…”
Section: Introductionsupporting
confidence: 64%
See 1 more Smart Citation
“…They concluded that occurrence-based representations were slightly, but significantly, superior to incidence-based representations; however, the experiments were on a very small scale with the datasets only containing 20-129 structures. Property prediction experiments using small QSAR and QSPR datasets were also reported by Olah et al [26] and by Azencott et al [27], both of whom again found that occurrence-based representations performed better than the corresponding incidence-based representations. A preference for occurrence-based representations was observed by Chen and Reynolds in simulated virtual screening experiments using the NCI AIDS and MDL Drug Data Report (MDDR) databases [28], although they noted that the highly specific fragment definitions that were employed (atom-pairs and atom-sequences) meant that there was often little difference between the two types of representation.…”
Section: Introductionsupporting
confidence: 64%
“…Specifically, they have been inspired by the MDL 320 keys [48] and the CATS (chemically advanced template search) concept [49]: they hence combine chemical substructure recognition (MDLstyle) with topologically-relevant pharmacophore patterns based on atom-pairs (CATS-style), in an effort to bridge the gap between substructural and pharmacophore descriptors. The fingerprints are thus more general in nature than the two previous ones; they have been studied previously in an extended evaluation of descriptors for mapping chemistry-biology relationships, this validation involving over a thousand QSAR series, each containing 25 or more compounds and spanning 2 log units in activity, using automated multivariate statistics [26,50]. The Sunset key-set contains 560 keys encoded by SMARTS: our experiments used 559 of these since one SMARTS (although correctly formed) could not be processed by Pipeline Pilot.…”
Section: Structural Representationsmentioning
confidence: 99%
“…We extracted ligand sets from databases that annotate molecules by therapeutic or biological category (Keiser et al, 2009 (Olah et al, 2004;Oprea et al, 2007), and the MDDR 2006.1 (MDL, now provided by Symyx) databases. For the ChEMBL and WOMBAT data sets, we organized ligands by their affinities, representing each protein target by three sets at 1 and 10 mM cutoffs (as well as an additional 100 mM cutoff for WOMBAT).…”
Section: Ligand Sets For Seamentioning
confidence: 99%
“…The concept of using SMILES and SMARTS patterns has been reported for applications in the atmospheric chemistry community (Barley et al, 2011;COBRA, Fooshee et al, 2012). While some sets of SMARTS patterns for substructure matching can additionally be found in the literature (Hann et al, 1999;Walters and Murcko, 2002;Olah et al, 2004;Enoch et al, 2008;Barley et al, 2011;Kenny et al, 2013) or on web databases -e.g., DAYLIGHT Chemical Information Systems, Inc. (DAYLIGHT Chemical Information Systems, Inc.) -knowledge regarding the extent of specificity and validation of the defined patterns is not available.…”
Section: Introductionmentioning
confidence: 99%