2012
DOI: 10.3414/me00-01-0052
|View full text |Cite
|
Sign up to set email alerts
|

Probability Machines

Abstract: Summary Background Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. Objectives The aim of this paper is to show how random forests and nearest neighbors can be used fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
119
0
1

Year Published

2013
2013
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 204 publications
(120 citation statements)
references
References 28 publications
0
119
0
1
Order By: Relevance
“…One intuitive method to do so is through a calibration plot. These plots have been used in bioinformatics and 430 in credit risk (Malley et al, 2012;Medema et al, 2009). They plot the class probability produced by the model (x-axis) against a non-parametric regression of the empirical proportion of defaulters with the same predicted probability (y-axis).…”
Section: Accepted Manuscriptmentioning
confidence: 99%
See 1 more Smart Citation
“…One intuitive method to do so is through a calibration plot. These plots have been used in bioinformatics and 430 in credit risk (Malley et al, 2012;Medema et al, 2009). They plot the class probability produced by the model (x-axis) against a non-parametric regression of the empirical proportion of defaulters with the same predicted probability (y-axis).…”
Section: Accepted Manuscriptmentioning
confidence: 99%
“…such as bioinformatics, image recognition, as well as in financial applications such as customer 250 attrition and credit scoring Lessmann et al, 2015;Malley et al, 2012).…”
mentioning
confidence: 99%
“…Node impurity is measured with the Gini index for classification trees and with the estimated response variance for regression trees. For probability estimation, trees are grown as regression trees; for a description of the concept, see Malley, Kruppa, Dasgupta, Malley, and Ziegler (2012). Variable importance can be determined with the decrease of node impurity or with permutation.…”
Section: Methodsmentioning
confidence: 99%
“…Images were classified from their pools in poorand good-response folders. The single-tree model with the Gini splitting algorithm was used to calculate a stratification accuracy of each parameter (DTREG predictive modelling software version 10.3.0, Brentwood, TN (24). The tree size control settings were set to minimum rows in a node: 1; minimum size node to split: 10; maximum tree levels: 10.…”
Section: Image Classificationmentioning
confidence: 99%