Probability Machines

Malley, James D.; Kruppa, Jochen; Dasgupta, Abhijit; Malley, Karen G.; Ziegler, Andreas

doi:10.3414/me00-01-0052

Cited by 204 publications

(120 citation statements)

References 28 publications

Supporting

Mentioning

119

Contrasting

Unclassified

Order By: Relevance

“…One intuitive method to do so is through a calibration plot. These plots have been used in bioinformatics and 430 in credit risk (Malley et al, 2012;Medema et al, 2009). They plot the class probability produced by the model (x-axis) against a non-parametric regression of the empirical proportion of defaulters with the same predicted probability (y-axis).…”

Section: Accepted Manuscriptmentioning

confidence: 99%

See 1 more Smart Citation

An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market

Fitzpatrick

Mues

2016

European Journal of Operational Research

View full text Add to dashboard Cite

Highlights• We evaluate default prediction performance of machine learning/regression models.• Including boosted trees, random forests, penalised linear/semi-parametric logistic regression.• Using data on over 300,000 residential mortgage loans.• The results indicate varying degrees of predictive power.• Statistical tests suggest boosted regression trees outperform penalised logistic regression.

show abstract

Section: Accepted Manuscriptmentioning

confidence: 99%

“…such as bioinformatics, image recognition, as well as in financial applications such as customer 250 attrition and credit scoring Lessmann et al, 2015;Malley et al, 2012).…”

mentioning

confidence: 99%

An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market

Fitzpatrick

Mues

2016

European Journal of Operational Research

View full text Add to dashboard Cite

show abstract

“…Node impurity is measured with the Gini index for classification trees and with the estimated response variance for regression trees. For probability estimation, trees are grown as regression trees; for a description of the concept, see Malley, Kruppa, Dasgupta, Malley, and Ziegler (2012). Variable importance can be determined with the decrease of node impurity or with permutation.…”

Section: Methodsmentioning

confidence: 99%

ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R

Wright¹,

Ziegler²

2017

J. Stat. Soft.

2,371

1,690

View full text Add to dashboard Cite

We introduce the C++ application and R package ranger. The software is a fast implementation of random forests for high dimensional data. Ensembles of classification, regression and survival trees are supported. We describe the implementation, provide examples, validate the package with a reference implementation, and compare runtime and memory usage with other implementations. The new software proves to scale best with the number of features, samples, trees, and features tried for splitting. Finally, we show that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study.

show abstract

“…Images were classified from their pools in poorand good-response folders. The single-tree model with the Gini splitting algorithm was used to calculate a stratification accuracy of each parameter (DTREG predictive modelling software version 10.3.0, Brentwood, TN (24). The tree size control settings were set to minimum rows in a node: 1; minimum size node to split: 10; maximum tree levels: 10.…”

Section: Image Classificationmentioning

confidence: 99%

Prediction of Chemotherapy Response in Primary Osteosarcoma by Use of the Multifractal Analysis of Magnetic Resonance Images

Djuričić¹,

Vasiljević²,

Ristić³

et al. 2018

Iran J Radiol

View full text Add to dashboard Cite

Background: Due to the high level of cytogenetic heterogeneity in osteosarcoma, personalized treatment is the promising strategy for the improvement in outcomes. This is currently not possible due to the absence of targeted therapies and reliable predictors for response to induction chemotherapy. Objectives: To investigate the predictive value of computational analysis of osteosarcoma magnetic resonance (MR) images. Patients and Methods:Multifractal analysis was performed on MR images of primary osteosarcoma of long tubular bones prior to OsteoSa induction chemotherapy. A total of 900 images derived from 67 good and poor responder patients were classified and compared to the actual retrospective outcome. Results: Among the six calculated multifractal features, Dqmax exerted the highest predictive value with the prediction accuracy of 74.3%, sensitivity of 72.4% and specificity of 76.2%. The obtained classification accuracy was validated by a ten V-fold split sample cross validation. The area under the curve (AUC) value for the best-performing multifractal Dqmax feature was 0.82 (95% confidence interval, 0.70 -0.91). Conclusion: These results suggest for the first time that measuring tumor structure by using multifractal geometry can predict an individual patient response to neoadjuvant cytotoxic therapy. Therefore, it potentially allows precise implementation of alternative treatment options. This predictive approach made use of digital data that is routinely collected but currently still underexploited.

show abstract

Probability Machines

Cited by 204 publications

References 28 publications

An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market

An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market

ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R

Prediction of Chemotherapy Response in Primary Osteosarcoma by Use of the Multifractal Analysis of Magnetic Resonance Images

Contact Info

Product

Resources

About