Peering Into the Black Box of Artificial Intelligence: Evaluation Metrics of Machine Learning Methods

Handelman, Guy; Kok, Hong Kuan; Chandra, Ronil V.; Razavi, Amir H.; Huang, Shiwei; Brooks, Mark; Lee, Michael J.; Asadi, Hamed

doi:10.2214/ajr.18.20224

Cited by 267 publications

(141 citation statements)

References 20 publications

Supporting

Mentioning

140

Contrasting

Unclassified

Order By: Relevance

“…21 A limitation of machine learning is, however, that algorithms may perform well in the sample they were trained on but rarely generalize to new data. 22,23 To address this, previous studies have applied within-sample cross-validation (CV), in which a given sample is iteratively divided into training and test data to ensure that model training and testing are conducted on different datasets. 18,24 While reducing the likelihood of overfitting, this approach leaves unaddressed the question whether the algorithm indeed generalizes to new and unseen data from independently recruited participants, 25 which is considered the gold standard of evaluating machine-learning performance.…”

Section: Introductionmentioning

confidence: 99%

Predicting sporadic Alzheimer's disease progression via inherited Alzheimer's disease‐informed machine‐learning

Franzmeier

Koutsouleris

Benzinger

et al. 2020

Alzheimer's & Dementia

View full text Add to dashboard Cite

Introduction Developing cross‐validated multi‐biomarker models for the prediction of the rate of cognitive decline in Alzheimer's disease (AD) is a critical yet unmet clinical challenge. Methods We applied support vector regression to AD biomarkers derived from cerebrospinal fluid, structural magnetic resonance imaging (MRI), amyloid‐PET and fluorodeoxyglucose positron‐emission tomography (FDG‐PET) to predict rates of cognitive decline. Prediction models were trained in autosomal‐dominant Alzheimer's disease (ADAD, n = 121) and subsequently cross‐validated in sporadic prodromal AD (n = 216). The sample size needed to detect treatment effects when using model‐based risk enrichment was estimated. Results A model combining all biomarker modalities and established in ADAD predicted the 4‐year rate of decline in global cognition (R2 = 24%) and memory (R2 = 25%) in sporadic AD. Model‐based risk‐enrichment reduced the sample size required for detecting simulated intervention effects by 50%–75%. Discussion Our independently validated machine‐learning model predicted cognitive decline in sporadic prodromal AD and may substantially reduce sample size needed in clinical trials in AD.

show abstract

Section: Introductionmentioning

confidence: 99%

Predicting sporadic Alzheimer's disease progression via inherited Alzheimer's disease‐informed machine‐learning

Franzmeier

Koutsouleris

Benzinger

et al. 2020

Alzheimer's & Dementia

View full text Add to dashboard Cite

show abstract

“…It is generally agreed that the interpretation of machine learning models is non-trivial and often referred to as a "black box". 15 While there are tools in place to aid in the interpretation of some models, they do not apply to all of the models we trained. As an example, the Supplement Materials detail some basic interpretation information referred to as "feature importances", denoting which features are most influential in the models.…”

Section: The Model Results Inmentioning

confidence: 99%

“…The process included splitting the dataset into training and testing sets, cross-validation, hyperparameter tuning, and a final evaluation on the testing set. 14,15 We developed a set of clinical criteria required to pass a model, and developed a tie-breaking scheme when multiple models for a single dataset were acceptable. Subsequently, we performed a retrospective analysis on a collection of variants that had been orthogonally confirmed.…”

Section: Overviewmentioning

confidence: 99%

“…The exact set of flags was notably different for single nucleotide variants (SNVs) and insertions/deletions (indels). The authors establish a 100% capture rate (confidence interval 98.5%-100% for SNVs, 99.1%-100% for indels) for false positive calls while maintaining relatively low rates of calls that were incorrectly labeled as false positive calls (13.2% for SNVs, 15.4% for indels). 10 Despite the success of the Lincoln et al approach, there are some drawbacks to a broad application of their method.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Reducing Sanger Confirmation Testing through False Positive Prediction Algorithms

Holt

Wilk

Sundlof

et al. 2020

Preprint

View full text Add to dashboard Cite

Purpose : Clinical genome sequencing (cGS) followed by orthogonal confirmatory testing is standard practice. While orthogonal testing significantly improves specificity it also results in increased turn-around-time and cost of testing. The purpose of this study is to evaluate machine learning models trained to identify false positive variants in cGS data to reduce the need for orthogonal testing.Methods : We sequenced five reference human genome samples characterized by the Genome in a Bottle Consortium (GIAB) and compared the results to an established set of variants for each genome referred to as a 'truth-set'. We then trained machine learning models to identify variants that were labeled as false positives.Results : After training, the models identified 99.5% of the false positive heterozygous single nucleotide variants (SNVs) and heterozygous insertions/deletions variants (indels) while reducing confirmatory testing of true positive SNVs to 1.67% and indels to 20.29%. Employing the algorithm in clinical practice reduced orthogonal testing using dideoxynucleotide (Sanger) sequencing by 78.22%. Conclusion :Our results indicate that a low false positive call rate can be maintained while significantly reducing the need for confirmatory testing. The framework that generated our models and results is publicly available at https://github.com/HudsonAlpha/STEVE .

show abstract

“…As a novel aside, the algorithm was compared to the accuracy of medically naive students trained on the same images who were able to achieve similar results, demonstrating the power of pattern recognition in radiodiagnosis, whether artificial or human. Along with a more extensively trained algorithm for hip fracture detection that was recently published by another Australian radiologist 3 as well as a number of other articles arising from medical imaging departments in Australia and New Zealand, 4,5 we can be confident that there are many technologically adept members of our profession that are capable of embracing this new technology and making it our own.…”

mentioning

confidence: 99%

Machine learning in radiology

Brotchie

2019

J Med Imag Rad Onc

View full text Add to dashboard Cite

Peering Into the Black Box of Artificial Intelligence: Evaluation Metrics of Machine Learning Methods

Cited by 267 publications

References 20 publications

Predicting sporadic Alzheimer's disease progression via inherited Alzheimer's disease‐informed machine‐learning

Predicting sporadic Alzheimer's disease progression via inherited Alzheimer's disease‐informed machine‐learning

Reducing Sanger Confirmation Testing through False Positive Prediction Algorithms

Machine learning in radiology

Contact Info

Product

Resources

About