Background: Ultrasound for developmental dysplasia of the hip (DDH) is challenging for nonexperts to perform and interpret. Recording "sweep" images allows more complete hip assessment, suitable for automation by artificial intelligence (AI), but reliability has not been established. We assessed agreement between readers of varying experience and a commercial AI algorithm, in DDH detection from infant hip ultrasound sweeps. Methods: We selected a full spectrum of poor-to-excellent quality images and normal to severe dysplasia, in 240 hips (120 single 2-dimensional images, 120 sweeps). For 12 readers (radiologists, sonographers, clinicians and researchers; 3 were DDH subspecialists), and a ultrasound-FDA-cleared AI software package (Medo Hip), we calculated interobserver reliability for alpha angle measurements by intraclass correlation coefficient (ICC 2,1 ) and for DDH classification by Randolph Kappa. Results: Alpha angle reliability was high for AI versus subspecialists (ICC = 0.87 for sweeps, 0.90 for single images). For DDH diagnosis from sweeps, agreement was high between subspecialists (kappa = 0.72), and moderate for nonsubspecialists (0.54) and AI (0.47). Agreement was higher for single images (kappa = 0.80, 0.66, 0.49). AI reliability deteriorated more than human readers for the poorest-quality images. The agreement of radiologists and clinicians with the accepted standard, while still high, was significantly poorer for sweeps than 2D images (P < 0.05). Conclusions: In a challenging exercise representing the wide spectrum of image quality and reader experience seen in realworld hip ultrasound, agreement on DDH diagnosis from easily obtained sweeps was only slightly lower than from single images, likely because of the additional step of selecting the best image. AI performed similarly to a nonsubspecialist human reader but was more affected by low-quality images.
Wrist trauma is common in children and generally requires radiography for exclusion of fractures, subjecting children to radiation and long wait times in the emergency department. Ultrasound (US) has potential to be a safer, faster diagnostic tool. This study aimed to determine how reliably US could detect distal radius fractures in children, to contrast the accuracy of 2DUS to 3DUS, and to assess the utility of artificial intelligence for image interpretation. 127 children were scanned with 2DUS and 3DUS on the affected wrist. US scans were then read by 7 blinded human readers and an AI model. With radiographs used as the gold standard, expert human readers obtained a mean sensitivity of 0.97 and 0.98 for 2DUS and 3DUS respectively. The AI model sensitivity was 0.91 and 1.00 for 2DUS and 3DUS respectively. Study data suggests that 2DUS is comparable to 3DUS and AI diagnosis is comparable to human experts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.