2017
DOI: 10.1186/s13321-017-0230-2
|View full text |Cite
|
Sign up to set email alerts
|

Efficiency of different measures for defining the applicability domain of classification models

Abstract: The goal of defining an applicability domain for a predictive classification model is to identify the region in chemical space where the model’s predictions are reliable. The boundary of the applicability domain is defined with the help of a measure that shall reflect the reliability of an individual prediction. Here, the available measures are differentiated into those that flag unusual objects and which are independent of the original classifier and those that use information of the trained classifier. The f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
53
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 56 publications
(58 citation statements)
references
References 71 publications
0
53
0
Order By: Relevance
“…Confidence estimation is defined as the probability of a model to misclassify a new entity. For random forests, confidence can be estimated based on the variance of prediction among the single decision trees in a forest . The metric for the number of single models (trees) that agree with the combined model (random forest) has been termed concordance .…”
Section: Resultsmentioning
confidence: 99%
“…Confidence estimation is defined as the probability of a model to misclassify a new entity. For random forests, confidence can be estimated based on the variance of prediction among the single decision trees in a forest . The metric for the number of single models (trees) that agree with the combined model (random forest) has been termed concordance .…”
Section: Resultsmentioning
confidence: 99%
“…The final result is the average of individual trees. Random forest can handle both regression problems and classification problems (Chen et al, ; Da et al, ; Klingspohn et al, ). There are a handful of adjustable parameters (e.g., number of trees, maximum depth of the tree, number of features), and these parameters have significant influence on the performance of the algorithm.…”
Section: Methodsmentioning
confidence: 99%
“…Random forest method is one of the widely used machine learning techniques employed in pharmaceutical research (Chen, Sheridan, Hornak, & Voigt, 2012;Da, Desaphy, Bret, & Rognan, 2015;Klingspohn, Mathea, Ter Laak, Heinrich, & Baumann, 2017;Marchese Robinson, Palczewska, Palczewski, & Kidley, 2017). It is an ensemble prediction model, which fits many decision trees trained on subsampled subsets of the original data set (Svetnik et al, 2003).…”
Section: Random Forestmentioning
confidence: 99%
“…Researchers have developed various algorithms, such as decision tree (DT), k-Nearest neighbor (kNN), support vector machine (SVM), random forest (RF), neural network etc. to achieve desirable outcomes in decision classification problems (Klingspohn, Mathea, Ter Laak, Heinrich, & Baumann, 2017;Podgorelec, Kokol, Stiglic, & Rozman, 2002;Sprague et al, 2014;Tian, Wang, Li, Xu, & Hou, 2012). For example, the classification and regression models are widely used in the discrimination of actives and inactives for a certain target (Hammann, Gutmann, Baumann, Helma, & Drewe, 2009;Helma, Cramer, Kramer, & De Raedt, 2004).…”
Section: Introductionmentioning
confidence: 99%
“…In terms of drug discovery, RF has become a common standard for quantitative predictive modeling (Chen, Sheridan, Hornak, & Voigt, 2012;Da, Desaphy, Bret, & Rognan, 2015). Several studies (Chen et al, 2012;Da et al, 2015;Klingspohn et al, 2017) have shown that RF models perform well in solving problems related to predictive chemical information. It possesses great advantages in solving classification problems.…”
Section: Introductionmentioning
confidence: 99%