Impact of Benchmark Data Set Topology on the Validation of Virtual Screening Methods: Exploration and Quantification by Spatial Statistics

Rohrer, Sebastian; Baumann, Knut

doi:10.1021/ci700099u

Cited by 32 publications

(50 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This choice is critical since the performances of a virtual screening method can vary considerably with the benchmarking dataset used for the study 74,75 . The first element that can guide the selection of a benchmarking dataset is the nature of the virtual screening method that will be evaluated.…”

Section: Selection Of the Optimal Benchmarking Dataset According To Tmentioning

confidence: 99%

Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives

Lagarde

Zagury

Montès

2015

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Virtual screening methods are commonly used nowadays in drug discovery processes. However, to ensure their reliability, they have to be carefully evaluated. The evaluation of these methods is often realized in a retrospective way, notably by studying the enrichment of benchmarking data sets. To this purpose, numerous benchmarking data sets were developed over the years, and the resulting improvements led to the availability of high quality benchmarking data sets. However, some points still have to be considered in the selection of the active compounds, decoys, and protein structures to obtain optimal benchmarking data sets.

show abstract

Section: Selection Of the Optimal Benchmarking Dataset According To Tmentioning

confidence: 99%

Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives

Lagarde

Zagury

Montès

2015

J. Chem. Inf. Model.

View full text Add to dashboard Cite

show abstract

“…The method to build MUV was the refined nearest neighbor analysis in spatial statistics [105]. First, 17 physicochemical properties were used for calculating pairwise Euclidean distances.…”

Section: Currently Available Benchmarking Setsmentioning

confidence: 99%

Benchmarking methods and data sets for ligand enrichment assessment in virtual screening

Xia

Tilahun

Reid

et al. 2015

Methods

View full text Add to dashboard Cite

Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. “analogue bias”, “artificial enrichment” and “false negative”. In addition, we introduced our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylase (HDAC) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The Leave-One-Out Cross-Validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased in terms of property matching, ROC curves and AUCs.

show abstract

“…Research Article [8,12,13]. The lack of robustness that results from such bias can seriously skew retrospective analyses and mislead researchers as to which method is likely to give the best prospective performance.…”

Section: Introductionmentioning

confidence: 99%

The effect of structural redundancy in validation sets on virtual screening performance

Clark

Shepphird²,

Holliday

2009

Journal of Chemometrics

View full text Add to dashboard Cite

The performance of a classification model is often assessed in terms of how well it separates a set of known observations into appropriate classes. If the validation sets used for such analyses are redundant due to bias in sampling, the relevance of the conclusions drawn to prospective work in which new kinds of positives are sought may be compromised. In the case of the various virtual screening techniques used in modern drug discovery, such bias generally appears as over‐representation of particular structural subclasses in the test set. We show how clustering by substructural similarity, followed by applying arithmetic and harmonic weighting schemes to receiver operating characteristic (ROC) curves, can be used to identify validation sets that are biased due to such redundancies. This can be accomplished qualitatively by direct examination or quantitatively by comparing the areas under the respective linear or semilog curves (AUCs or pAUCs). Copyright © 2009 John Wiley & Sons, Ltd.

show abstract

Impact of Benchmark Data Set Topology on the Validation of Virtual Screening Methods: Exploration and Quantification by Spatial Statistics

Cited by 32 publications

References 53 publications

Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives

Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives

Benchmarking methods and data sets for ligand enrichment assessment in virtual screening

The effect of structural redundancy in validation sets on virtual screening performance

Contact Info

Product

Resources

About