2010
DOI: 10.1021/ci100253r
|View full text |Cite
|
Sign up to set email alerts
|

Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set

Abstract: The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of "distance to model" (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
222
0
2

Year Published

2011
2011
2021
2021

Publication Types

Select...
8
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 208 publications
(228 citation statements)
references
References 55 publications
(104 reference statements)
4
222
0
2
Order By: Relevance
“…In the current study, the CONSENSUS-STD (standard deviation of predictions of the ensemble of models in the consensus model) was used as a measure of DM. This DM provided the best separation of molecules with low and high accuracy of predictions in several benchmarking studies [59,60]. A threshold value of 95% of compounds from the training set was used to determine the qualitative ADs of models.…”
Section: Accepted Manuscriptmentioning
confidence: 99%
“…In the current study, the CONSENSUS-STD (standard deviation of predictions of the ensemble of models in the consensus model) was used as a measure of DM. This DM provided the best separation of molecules with low and high accuracy of predictions in several benchmarking studies [59,60]. A threshold value of 95% of compounds from the training set was used to determine the qualitative ADs of models.…”
Section: Accepted Manuscriptmentioning
confidence: 99%
“…One may also generate many models from the same training set but using different subset of features (F var ). Most of these approaches have been explored, but frequently used were ensemble of fixed learning algorithm with varied features and (or) varied training set (T fix Al fix F var or T var Al fix F var ) [35,36,[40][41][42][43][44][45][46][47][48]. For example, a technique that uses mixed training set and features (T var Al fix F var ) is the Random Forest [49] technique which is an ensemble of many decision trees built from a variation of sample and features.…”
Section: Introductionmentioning
confidence: 99%
“…Whilst this may be interpreted as a range of chemical structures for which the expected model performance is well characterized [180], the "applicability domain" is commonly interpreted as a region of chemical structure space in which the model is known to exhibit desirable predictivity [155, 156, 158 205 208]. A distinction may be made between those approaches which simply try to categorize compounds as inside the applicability domain (AD) or outside the AD, and those which seek to directly assess the expected performance of the model for a particular compound [205]. In the context of predictive toxicology, where the "mechanism of toxic action" is understood, the former approach may be informed by mechanistic reasoning [207].…”
Section: The Applicability Domains Of In Silico Modelsmentioning
confidence: 99%