Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT

Self Cite

Our study aimed to assess the accuracy and limitations of ChatGPT in the domain of MRI, focused on evaluating ChatGPT’s performance in answering simple knowledge questions and specialized multiple-choice questions related to MRI. A two-step approach was used to evaluate ChatGPT. In the first step, 50 simple MRI-related questions were asked, and ChatGPT’s answers were categorized as correct, partially correct, or incorrect by independent researchers. In the second step, 75 multiple-choice questions covering various MRI topics were posed, and the answers were similarly categorized. The study utilized Cohen’s kappa coefficient for assessing interobserver agreement. ChatGPT demonstrated high accuracy in answering straightforward MRI questions, with over 85% classified as correct. However, its performance varied significantly across multiple-choice questions, with accuracy rates ranging from 40% to 66.7%, depending on the topic. This indicated a notable gap in its ability to handle more complex, specialized questions requiring deeper understanding and context. In conclusion, this study critically evaluates the accuracy of ChatGPT in addressing questions related to Magnetic Resonance Imaging (MRI), highlighting its potential and limitations in the healthcare sector, particularly in radiology. Our findings demonstrate that ChatGPT, while proficient in responding to straightforward MRI-related questions, exhibits variability in its ability to accurately answer complex multiple-choice questions that require more profound, specialized knowledge of MRI. This discrepancy underscores the nuanced role AI can play in medical education and healthcare decision-making, necessitating a balanced approach to its application.

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

ChatGPT’s Accuracy on Magnetic Resonance Imaging Basics: Characteristics and Limitations Depending on the Question Type

Lee,

Lee

2024

Self Cite

“…Increasing research focuses on developing and validating AI models that can support medical diagnostics, offering tools with high accuracy and clinical utility. For example, a study by Lee et al [13] showed that the KARA-CXR model, developed using advanced AI techniques and large language models, achieved significantly higher diagnostic accuracy in interpreting chest X-ray images compared to ChatGPT. Similarly, innovative machine learning schemes, such as those developed by Al-Karawi et al [42], have demonstrated high effectiveness in identifying COVID-19 infections based on texture analysis of chest X-ray images.…”

Section: Lack Of Attempt To Assess the Cobb Angle By Microsoft Bingmentioning

confidence: 99%

Artificial Intelligence in Medical Imaging: Analyzing the Performance of ChatGPT and Microsoft Bing in Scoliosis Detection and Cobb Angle Assessment

Fabijan,

Zawadzka-Fabijan,

Fabijan

et al. 2024

Open-source artificial intelligence models (OSAIM) find free applications in various industries, including information technology and medicine. Their clinical potential, especially in supporting diagnosis and therapy, is the subject of increasingly intensive research. Due to the growing interest in artificial intelligence (AI) for diagnostic purposes, we conducted a study evaluating the capabilities of AI models, including ChatGPT and Microsoft Bing, in the diagnosis of single-curve scoliosis based on posturographic radiological images. Two independent neurosurgeons assessed the degree of spinal deformation, selecting 23 cases of severe single-curve scoliosis. Each posturographic image was separately implemented onto each of the mentioned platforms using a set of formulated questions, starting from ‘What do you see in the image?’ and ending with a request to determine the Cobb angle. In the responses, we focused on how these AI models identify and interpret spinal deformations and how accurately they recognize the direction and type of scoliosis as well as vertebral rotation. The Intraclass Correlation Coefficient (ICC) with a ‘two-way’ model was used to assess the consistency of Cobb angle measurements, and its confidence intervals were determined using the F test. Differences in Cobb angle measurements between human assessments and the AI ChatGPT model were analyzed using metrics such as RMSEA, MSE, MPE, MAE, RMSLE, and MAPE, allowing for a comprehensive assessment of AI model performance from various statistical perspectives. The ChatGPT model achieved 100% effectiveness in detecting scoliosis in X-ray images, while the Bing model did not detect any scoliosis. However, ChatGPT had limited effectiveness (43.5%) in assessing Cobb angles, showing significant inaccuracy and discrepancy compared to human assessments. This model also had limited accuracy in determining the direction of spinal curvature, classifying the type of scoliosis, and detecting vertebral rotation. Overall, although ChatGPT demonstrated potential in detecting scoliosis, its abilities in assessing Cobb angles and other parameters were limited and inconsistent with expert assessments. These results underscore the need for comprehensive improvement of AI algorithms, including broader training with diverse X-ray images and advanced image processing techniques, before they can be considered as auxiliary in diagnosing scoliosis by specialists.

“…Lee et al [ 4 ] evaluate the diagnostic accuracy of two AI techniques, namely KARA-CXR and ChatGPT, in chest X-ray reading. Using 2000 chest X-ray images, their study assessed accuracy, false findings, location inaccuracies, count inaccuracies, and hallucinations.…”

Section: Overview Of the Published Articlesmentioning

confidence: 99%

Advancements in Artificial Intelligence for Medical Computer-Aided Diagnosis

Al-antari

2024

Rapid advancements in artificial intelligence (AI) and machine learning (ML) are currently transforming the field of diagnostics, enabling unprecedented accuracy and efficiency in disease detection, classification, and treatment planning. This Special Issue, entitled “Artificial Intelligence Advances for Medical Computer-Aided Diagnosis”, presents a curated collection of cutting-edge research that explores the integration of AI and ML technologies into various diagnostic modalities. The contributions presented here highlight innovative algorithms, models, and applications that pave the way for improved diagnostic capabilities across a range of medical fields, including radiology, pathology, genomics, and personalized medicine. By showcasing both theoretical advancements and practical implementations, this Special Issue aims to provide a comprehensive overview of current trends and future directions in AI-driven diagnostics, fostering further research and collaboration in this dynamic and impactful area of healthcare. We have published a total of 12 research articles in this Special Issue, all collected between March 2023 and December 2023, comprising 1 Editorial cover letter, 9 regular research articles, 1 review article, and 1 article categorized as “other”.