Art or Artifact: Evaluating the Accuracy, Appeal, and Educational Value of AI-Generated Imagery in DALL·E 3 for Illustrating Congenital Heart Diseases

Temsah, Mohamad-Hani; Alhuzaimi, Abdullah N.; Almansour, Mohammed; Aljamaan, Fadi; Alhasan, Khalid; Batarfi, Munirah A.; Altamimi, Ibraheem; Alharbi, Amani; Alsuhaibani, Adel Abdulaziz; Alwakeel, Leena; Alzahrani, Abdulrahman Abdulkhaliq; Alsulaim, Khaled B.; Jamal, Amr; Khayat, Afnan; Alghamdi, Mohammed Hussien; Halwani, Rabih; Khan, Muhammad Khurram; Al-Eyadhy, Ayman; Nazer, Rakan

doi:10.1007/s10916-024-02072-0

J Med Syst

2024

DOI: 10.1007/s10916-024-02072-0

|View full text |Cite

Art or Artifact: Evaluating the Accuracy, Appeal, and Educational Value of AI-Generated Imagery in DALL·E 3 for Illustrating Congenital Heart Diseases

Mohamad-Hani Temsah,

Abdullah N. Alhuzaimi,

Mohammed Almansour

et al.

Abstract: Arti cial Intelligence (AI), particularly AI-Generated Imagery, holds the capability to transform medical and patient education. This research explores the use of AI-generated imagery, from text-to-images, in medical education, focusing on congenital heart diseases (CHD). Utilizing ChatGPT's DALL•E 3, the research aims to assess the accuracy and educational value of AI-created images for 20 common CHDs. The study involved generating a total of 110 images for normal human heart and 20 common CHDs through DALL•E… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article5

Relationship

Self Cite1

Independent4

Authors

Journals

Cited by 5 publications

References 52 publications

(37 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

Large Language Model‐Based Chatbots in Higher Education

Yigci,

Eryilmaz,

Yetisen

et al. 2024

Advanced Intelligent Systems

View full text Add to dashboard Cite

Large language models (LLMs) are artificial intelligence (AI) platforms capable of analyzing and mimicking natural language processing. Leveraging deep learning, LLM capabilities have been advanced significantly, giving rise to generative chatbots such as Generative Pre‐trained Transformer (GPT). GPT‐1 was initially released by OpenAI in 2018. ChatGPT's release in 2022 marked a global record of speed in technology uptake, attracting more than 100 million users in two months. Consequently, the utility of LLMs in fields including engineering, healthcare, and education has been explored. The potential of LLM‐based chatbots in higher education has sparked significant interest and ignited debates. LLMs can offer personalized learning experiences and advance asynchronized learning, potentially revolutionizing higher education, but can also undermine academic integrity. Although concerns regarding AI‐generated output accuracy, the spread of misinformation, propagation of biases, and other legal and ethical issues have not been fully addressed yet, several strategies have been implemented to mitigate these limitations. Here, the development of LLMs, properties of LLM‐based chatbots, and potential applications of LLM‐based chatbots in higher education are discussed. Current challenges and concerns associated with AI‐based learning platforms are outlined. The potentials of LLM‐based chatbot use in the context of learning experiences in higher education settings are explored.

show abstract

Large Language Model‐Based Chatbots in Higher Education

Yigci,

Eryilmaz,

Yetisen

et al. 2024

Advanced Intelligent Systems

View full text Add to dashboard Cite

show abstract

Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study

Aljamaan,

Temsah,

Altamimi

et al. 2024

JMIR Med Inform

View full text Add to dashboard Cite

Background Artificial intelligence (AI) chatbots have recently gained use in medical practice by health care practitioners. Interestingly, the output of these AI chatbots was found to have varying degrees of hallucination in content and references. Such hallucinations generate doubts about their output and their implementation. Objective The aim of our study was to propose a reference hallucination score (RHS) to evaluate the authenticity of AI chatbots’ citations. Methods Six AI chatbots were challenged with the same 10 medical prompts, requesting 10 references per prompt. The RHS is composed of 6 bibliographic items and the reference’s relevance to prompts’ keywords. RHS was calculated for each reference, prompt, and type of prompt (basic vs complex). The average RHS was calculated for each AI chatbot and compared across the different types of prompts and AI chatbots. Results Bard failed to generate any references. ChatGPT 3.5 and Bing generated the highest RHS (score=11), while Elicit and SciSpace generated the lowest RHS (score=1), and Perplexity generated a middle RHS (score=7). The highest degree of hallucination was observed for reference relevancy to the prompt keywords (308/500, 61.6%), while the lowest was for reference titles (169/500, 33.8%). ChatGPT and Bing had comparable RHS (β coefficient=–0.069; P=.32), while Perplexity had significantly lower RHS than ChatGPT (β coefficient=–0.345; P<.001). AI chatbots generally had significantly higher RHS when prompted with scenarios or complex format prompts (β coefficient=0.486; P<.001). Conclusions The variation in RHS underscores the necessity for a robust reference evaluation tool to improve the authenticity of AI chatbots. Further, the variations highlight the importance of verifying their output and citations. Elicit and SciSpace had negligible hallucination, while ChatGPT and Bing had critical hallucination levels. The proposed AI chatbots’ RHS could contribute to ongoing efforts to enhance AI’s general reliability in medical research.

show abstract

Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial

Gan,

Ouyang,

et al. 2024

J Med Internet Res

View full text Add to dashboard Cite

Background ChatGPT is a natural language processing model developed by OpenAI, which can be iteratively updated and optimized to accommodate the changing and complex requirements of human verbal communication. Objective The study aimed to evaluate ChatGPT’s accuracy in answering orthopedics-related multiple-choice questions (MCQs) and assess its short-term effects as a learning aid through a randomized controlled trial. In addition, long-term effects on student performance in other subjects were measured using final examination results. Methods We first evaluated ChatGPT’s accuracy in answering MCQs pertaining to orthopedics across various question formats. Then, 129 undergraduate medical students participated in a randomized controlled study in which the ChatGPT group used ChatGPT as a learning tool, while the control group was prohibited from using artificial intelligence software to support learning. Following a 2-week intervention, the 2 groups’ understanding of orthopedics was assessed by an orthopedics test, and variations in the 2 groups’ performance in other disciplines were noted through a follow-up at the end of the semester. Results ChatGPT-4.0 answered 1051 orthopedics-related MCQs with a 70.60% (742/1051) accuracy rate, including 71.8% (237/330) accuracy for A1 MCQs, 73.7% (330/448) accuracy for A2 MCQs, 70.2% (92/131) accuracy for A3/4 MCQs, and 58.5% (83/142) accuracy for case analysis MCQs. As of April 7, 2023, a total of 129 individuals participated in the experiment. However, 19 individuals withdrew from the experiment at various phases; thus, as of July 1, 2023, a total of 110 individuals accomplished the trial and completed all follow-up work. After we intervened in the learning style of the students in the short term, the ChatGPT group answered more questions correctly than the control group (ChatGPT group: mean 141.20, SD 26.68; control group: mean 130.80, SD 25.56; P=.04) in the orthopedics test, particularly on A1 (ChatGPT group: mean 46.57, SD 8.52; control group: mean 42.18, SD 9.43; P=.01), A2 (ChatGPT group: mean 60.59, SD 10.58; control group: mean 56.66, SD 9.91; P=.047), and A3/4 MCQs (ChatGPT group: mean 19.57, SD 5.48; control group: mean 16.46, SD 4.58; P=.002). At the end of the semester, we found that the ChatGPT group performed better on final examinations in surgery (ChatGPT group: mean 76.54, SD 9.79; control group: mean 72.54, SD 8.11; P=.02) and obstetrics and gynecology (ChatGPT group: mean 75.98, SD 8.94; control group: mean 72.54, SD 8.66; P=.04) than the control group. Conclusions ChatGPT answers orthopedics-related MCQs accurately, and students using it excel in both short-term and long-term assessments. Our findings strongly support ChatGPT’s integration into medical education, enhancing contemporary instructional methods. Trial Registration Chinese Clinical Trial Registry Chictr2300071774; https://www.chictr.org.cn/hvshowproject.html ?id=225740&v=1.0

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Art or Artifact: Evaluating the Accuracy, Appeal, and Educational Value of AI-Generated Imagery in DALL·E 3 for Illustrating Congenital Heart Diseases

Cited by 5 publications

References 52 publications

Large Language Model‐Based Chatbots in Higher Education

Large Language Model‐Based Chatbots in Higher Education

Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study

Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial

Contact Info

Product

Resources

About