Objectives
To develop and externally geographically validate a mixed-effects deep learning model to diagnose COVID-19 from computed tomography (CT) imaging following best practice guidelines and assess the strengths and weaknesses of deep learning COVID-19 diagnosis.
Design
Model development and external validation with retrospectively collected data from two countries.
Setting
Hospitals in Moscow, Russia, collected between March 1, 2020, and April 25, 2020. The China Consortium of Chest CT Image Investigation (CC-CCII) collected between January 25, 2020, and March 27, 2020.
Participants
1,110 and 796 patients with either COVID-19 or healthy CT volumes from Moscow, Russia, and China, respectively.
Main outcome measures
We developed a deep learning model with a novel mixed-effects layer to model the relationship between slices in CT imaging. The model was trained on a dataset from hospitals in Moscow, Russia, and externally geographically validated on a dataset from a consortium of Chinese hospitals. Model performance was evaluated in discriminative performance using the area under the receiver operating characteristic (AUROC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). In addition, calibration performance was assessed using calibration curves, and clinical benefit was assessed using decision curve analysis. Finally, the model's decisions were assessed visually using saliency maps.
Results
External validation on the large Chinese dataset showed excellent performance with an AUROC of 0.956 (95% CI: 0.943, 0.970), with a sensitivity and specificity, PPV, and NPV of 0.879 (0.852, 0.906), 0.942 (0.913, 0.972), 0.988 (0.975, 1.00), and 0.732 (0.650, 0.814).
Conclusions
Deep learning can reduce stress on healthcare systems by automatically screening CT imaging for COVID-19. However, deep learning models must be robustly assessed using various performance measures and externally validated in each setting. In addition, best practice guidelines for developing and reporting predictive models are vital for the safe adoption of such models.