A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI

Sounderajah, Viknesh; Ashrafian, Hutan; Rose, Sherri; Shah, Nigam H.; Ghassemi, Marzyeh; Golub, Robert M.; Kahn, Charles Η.; Esteva, Andre; Karthikesalingam, Alan; Mateen, Bilal A.; Webster, Dale R.; Miléa, Dan; Ting, Daniel Shu Wei; Treanor, Darren; Cushnan, Dominic; King, Dominic; McPherson, Duncan; Glocker, Ben; Greaves, Felix; Harling, Leanne; Ordish, Johan; Cohen, Jérémie F.; Deeks, Jon; Leeflang, Mariska; Diamond, Matthew C; McInnes, Matthew D. F.; McCradden, Melissa; Abràmoff, Michael D.; Normahani, Pasha; Markar, Sheraz R.; Chang, Stephanie; Liu, Xiaoxuan; Mallett, Susan; Shetty, Shravya; Denniston, Alastair K.O.; Collins, Gary S.; Moher, David; Whiting, Penny; Bossuyt, Patrick M.; Darzi, Ara

doi:10.1038/s41591-021-01517-0

Cited by 106 publications

(78 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This justifies the high frequency of unclear and not applicable answers in our review, to the QUADAS-2 tool questions. For example, the index test section gave 50% of not applicable and 7.14% of unclear answers as the QUADAS-2 tool wasn’t designed to evaluate the risk of bias for AI diagnostic accuracy studies [ 50 ].…”

Section: Discussionmentioning

confidence: 99%

The Effectiveness of Semi-Automated and Fully Automatic Segmentation for Inferior Alveolar Canal Localization on CBCT Scans: A Systematic Review

Issa

Olszewski

Dyszkiewicz-Konwińska

2022

IJERPH

View full text Add to dashboard Cite

This systematic review aims to identify the available semi-automatic and fully automatic algorithms for inferior alveolar canal localization as well as to present their diagnostic accuracy. Articles related to inferior alveolar nerve/canal localization using methods based on artificial intelligence (semi-automated and fully automated) were collected electronically from five different databases (PubMed, Medline, Web of Science, Cochrane, and Scopus). Two independent reviewers screened the titles and abstracts of the collected data, stored in EndnoteX7, against the inclusion criteria. Afterward, the included articles have been critically appraised to assess the quality of the studies using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool. Seven studies were included following the deduplication and screening against exclusion criteria of the 990 initially collected articles. In total, 1288 human cone-beam computed tomography (CBCT) scans were investigated for inferior alveolar canal localization using different algorithms and compared to the results obtained from manual tracing executed by experts in the field. The reported values for diagnostic accuracy of the used algorithms were extracted. A wide range of testing measures was implemented in the analyzed studies, while some of the expected indexes were still missing in the results. Future studies should consider the new artificial intelligence guidelines to ensure proper methodology, reporting, results, and validation.

show abstract

Section: Discussionmentioning

confidence: 99%

The Effectiveness of Semi-Automated and Fully Automatic Segmentation for Inferior Alveolar Canal Localization on CBCT Scans: A Systematic Review

Issa

Olszewski

Dyszkiewicz-Konwińska

2022

IJERPH

View full text Add to dashboard Cite

show abstract

“…Future studies should validate models and follow reporting guidelines such as TRIPOD 17 or the upcoming QUADAD-AI 51 and TRIPOD-AI 52 to bring about clinically useful and deployable models. Further research could look deeper into the areas of images identified by the algorithm as shown on the saliency maps; this could potentially identify new features of COVID-19 which have gone unnoticed.…”

Section: Discussionmentioning

confidence: 99%

Development and External Validation of a Mixed-Effects Deep Learning Model to Diagnose COVID-19 from CT Imaging

Bridge

Meng

Zhu

et al. 2022

Preprint

View full text Add to dashboard Cite

Objectives To develop and externally geographically validate a mixed-effects deep learning model to diagnose COVID-19 from computed tomography (CT) imaging following best practice guidelines and assess the strengths and weaknesses of deep learning COVID-19 diagnosis. Design Model development and external validation with retrospectively collected data from two countries. Setting Hospitals in Moscow, Russia, collected between March 1, 2020, and April 25, 2020. The China Consortium of Chest CT Image Investigation (CC-CCII) collected between January 25, 2020, and March 27, 2020. Participants 1,110 and 796 patients with either COVID-19 or healthy CT volumes from Moscow, Russia, and China, respectively. Main outcome measures We developed a deep learning model with a novel mixed-effects layer to model the relationship between slices in CT imaging. The model was trained on a dataset from hospitals in Moscow, Russia, and externally geographically validated on a dataset from a consortium of Chinese hospitals. Model performance was evaluated in discriminative performance using the area under the receiver operating characteristic (AUROC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). In addition, calibration performance was assessed using calibration curves, and clinical benefit was assessed using decision curve analysis. Finally, the model's decisions were assessed visually using saliency maps. Results External validation on the large Chinese dataset showed excellent performance with an AUROC of 0.956 (95% CI: 0.943, 0.970), with a sensitivity and specificity, PPV, and NPV of 0.879 (0.852, 0.906), 0.942 (0.913, 0.972), 0.988 (0.975, 1.00), and 0.732 (0.650, 0.814). Conclusions Deep learning can reduce stress on healthcare systems by automatically screening CT imaging for COVID-19. However, deep learning models must be robustly assessed using various performance measures and externally validated in each setting. In addition, best practice guidelines for developing and reporting predictive models are vital for the safe adoption of such models.

show abstract

“…Studies were ranked into three AI bias categories (low moderate (ML) and high moderate (MH)) by computing the mean score and cumulative score for each study, taken for the AI attributes. The comparative analysis with various AI algorithms was carried out to determine the bias cutoff and to understand the architecture of these studies [ 59 , 63 , 64 ].…”

Section: Ranking Of Selected Studiesmentioning

confidence: 99%

Bias Investigation in Artificial Intelligence Systems for Early Detection of Parkinson’s Disease: A Narrative Review

et al. 2022

View full text Add to dashboard Cite

Background and Motivation: Diagnosis of Parkinson’s disease (PD) is often based on medical attention and clinical signs. It is subjective and does not have a good prognosis. Artificial Intelligence (AI) has played a promising role in the diagnosis of PD. However, it introduces bias due to lack of sample size, poor validation, clinical evaluation, and lack of big data configuration. The purpose of this study is to compute the risk of bias (RoB) automatically. Method: The PRISMA search strategy was adopted to select the best 39 AI studies out of 85 PD studies closely associated with early diagnosis PD. The studies were used to compute 30 AI attributes (based on 6 AI clusters), using AP(ai)Bias 1.0 (AtheroPointTM, Roseville, CA, USA), and the mean aggregate score was computed. The studies were ranked and two cutoffs (Moderate-Low (ML) and High-Moderate (MH)) were determined to segregate the studies into three bins: low-, moderate-, and high-bias. Result: The ML and HM cutoffs were 3.50 and 2.33, respectively, which constituted 7, 13, and 6 for low-, moderate-, and high-bias studies. The best and worst architectures were “deep learning with sketches as outcomes” and “machine learning with Electroencephalography,” respectively. We recommend (i) the usage of power analysis in big data framework, (ii) that it must undergo scientific validation using unseen AI models, and (iii) that it should be taken towards clinical evaluation for reliability and stability tests. Conclusion: The AI is a vital component for the diagnosis of early PD and the recommendations must be followed to lower the RoB.

show abstract

A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI

Cited by 106 publications

References 8 publications

The Effectiveness of Semi-Automated and Fully Automatic Segmentation for Inferior Alveolar Canal Localization on CBCT Scans: A Systematic Review

The Effectiveness of Semi-Automated and Fully Automatic Segmentation for Inferior Alveolar Canal Localization on CBCT Scans: A Systematic Review

Development and External Validation of a Mixed-Effects Deep Learning Model to Diagnose COVID-19 from CT Imaging

Bias Investigation in Artificial Intelligence Systems for Early Detection of Parkinson’s Disease: A Narrative Review

Contact Info

Product

Resources

About