Artificial intelligence (AI) and machine learning (ML) are becoming critical in developing and deploying personalized medicine and targeted clinical trials. Recent advances in ML have enabled the integration of wider ranges of data including both medical records and imaging ({radiomics}). However, the development of prognostic models is complex as no modelling strategy is universally superior to others and validation of developed models requires large and diverse datasets to demonstrate that prognostic models developed (regardless of method) from one dataset are applicable to other datasets both internally and externally. Using a retrospective dataset of 2,552 patients from a single institution and a strict evaluation framework that included external validation on three external patient cohorts (873 patients), we crowdsourced the development of ML models to predict overall survival in head and neck cancer (HNC) using electronic medical records and pre-treatment radiological images. To assess the relative contributions of radiomics in predicting HNC prognosis, we compared 12 different models using imaging and/or electronic medical record (EMR) data. The model with the highest accuracy used multitask learning on clinical data and tumour volume, achieving high prognostic accuracy for 2-year and lifetime survival prediction, outperforming models relying on clinical data only, engineered radiomics, or complex deep neural network architecture. However, when we attempted to extend the best-performing models from this large training dataset to other institutions, we observed significant reductions in the performance of the model in those datasets, highlighting the importance of detailed population-based reporting for AI/ML model utility and stronger validation frameworks.
The field of radiomics is at the forefront of personalized medicine. However, there are concerns regarding the robustness of its features against multiple medical imaging parameters and the performance of the predictive models built upon them. Therefore, our review aims to identify image perturbation factors (IPF) that most influence the robustness of radiomic features in biomedical research. We also provide insights into the validity and discrepancy of different methodologies applied to investigate the robustness of radiomic features. We selected 527 papers based on the primary criterion that the papers had imaging parameters that affected the reproducibility of radiomic features extracted from computed tomography (CT) images. We compared the reported performance of these parameters along with IPF in the eligible studies. We then proceeded to divide our studies into three groups based on the type of their IPF. The three groups were (i) scanner parameters, (ii) acquisition parameters and (iii) reconstruction parameters. Our review highlighted that the reconstruction algorithm was the most reproducible factor and shape along with Intensity histogram (IH) were the most robust radiomic features against variation in imaging parameters. This review identified substantial inconsistencies related to the methodology and the reporting style of the reviewed studies such as type of study performed, the metrics used for robustness, the feature extraction techniques, the imaging factors, the reporting style and their outcome inclusion. Finally, we hope the IPFs and the methodology inconsistencies identified will aid the scientific community in devising its research in a way that is more reproducible and avoids the pitfalls of previous analyses.
Background and purpose: Computed tomography (CT) is one of the most common medical imaging modalities in radiation oncology and radiomics research, the computational voxel-level analysis of medical images. Radiomics is vulnerable to the effects of dental artifacts (DA) caused by metal implants or fillings and can hamper future reproducibility on new datasets. In this study we seek to better understand the robustness of quantitative radiomic features to DAs. Furthermore, we propose a novel method of detecting DAs in order to safeguard radiomic studies and improve reproducibility. Materials and methods: We analyzed the correlations between radiomic features and the location of dental artifacts in a new dataset containing 3D CT scans from 3211 patients. We then combined conventional image processing techniques with a pre-trained convolutional neural network to create a three-class patient-level DA classifier and slice-level DA locator. Finally, we demonstrated its utility in reducing the correlations between the location of DAs and certain radiomic features. Results: We found that when strong DAs were present, the proximity of the tumour to the mouth was highly correlated with 36 radiomic features. We predicted the correct DA magnitude yielding a Matthews correlation coefficient of 0.73 and location of DAs achieving the same level of agreement as human labellers. Conclusions: Removing radiomic features or CT slices containing DAs could reduce the unwanted correlations between the location of DAs and radiomic features. Automated DA detection can be used to improve the reproducibility of radiomic studies; an important step towards creating effective radiomic models for use in clinical radiation oncology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.