Background Identifying visually and semantically similar radiological images in a database can facilitate the creation of decision support tools, teaching files, and research cohorts. Existing content-based image retrieval tools are often limited to searching by pixel-wise difference or vector distance of model predictions. Vision transformers (ViT) use attention to simultaneously take into account radiological diagnosis and visual appearance. Purpose We aim to develop a ViT-based image retrieval framework and evaluate the algorithm on NIH Chest Radiographs (CXR) and NLST Chest CTs. Materials and Methods The model was trained on 112,120 CXR and 111,955 CT images. For CXR, a ViT binary classifier was trained on 4 ground truth labels (Cardiomegaly, Opacity, Emphysema, No Finding) and ensembled to produce multilabel classifications for each CXR. For CT, a regression model was trained to minimize L1 loss on the continuous ground truth labels of patient weight. The ViT image embedding layer was treated as a global image descriptor, using the L2 distance between descriptors as a similarity measure. To qualitatively evaluate the model, five radiologists performed a reader performance study with random query images (25 CT, 25 CXR). For each image, they chose the 5 most similar images from a set of 10 images (the 5 closest and 5 furthest images from the query in model space). Inter-radiologist and radiologist-model agreement statistics were calculated. Results The CXR model achieved nDCG@5 of 0.73 (p<0.001) and Cardiomegaly mAP@5 of 0.76 (p<0.001) among other results. The CT model achieved nDCG of 16.85 (p<0.001). The model prediction agreed with radiologist consensus on 86% of CXR samples and 79.2% of CT samples. Inter-radiologist Fleiss Kappa of 0.51 and radiologist-consensus-to-model Cohen's Kappa of 0.65 were observed. A t-SNE of the CT model latent space was generated to validate similar image clustering. Conclusion Our ViT architecture retrieved visually and semantically similar radiological images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.