Background Artificial-intelligence algorithms derive rules and patterns from large amounts of data to calculate the probabilities of various outcomes using new sets of similar data. In medicine, artificial intelligence (AI) has been applied primarily to image-recognition diagnostic tasks and evaluating the probabilities of particular outcomes after treatment. However, the performance and limitations of AI in the automated detection and classification of fractures has not been examined comprehensively. Question/purposes In this systematic review, we asked (1) What is the proportion of correctly detected or classified fractures and the area under the receiving operating characteristic (AUC) curve of AI fracture detection and classification models? (2) What is the performance of AI in this setting compared with the performance of human examiners? Methods The PubMed, Embase, and Cochrane databases were systematically searched from the start of each respective database until September 6, 2018, using terms related to “fracture”, “artificial intelligence”, and “detection, prediction, or evaluation.” Of 1221 identified studies, we retained 10 studies: eight studies involved fracture detection (ankle, hand, hip, spine, wrist, and ulna), one addressed fracture classification (diaphyseal femur), and one addressed both fracture detection and classification (proximal humerus). We registered the review before data collection (PROSPERO: CRD42018110167) and used the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA). We reported the range of the accuracy and AUC for the performance of the predicted fracture detection and/or classification task. An AUC of 1.0 would indicate perfect prediction, whereas 0.5 would indicate a prediction is no better than a flip-of-a-coin. We conducted quality assessment using a seven-item checklist based on a modified methodologic index for nonrandomized studies instrument (MINORS). Results For fracture detection, the AUC in five studies reflected near perfect prediction (range, 0.95-1.0), and the accuracy in seven studies ranged from 83% to 98%. For fracture classification, the AUC was 0.94 in one study, and the accuracy in two studies ranged from 77% to 90%. In two studies AI outperformed human examiners for detecting and classifying hip and proximal humerus fractures, and one study showed equivalent performance for detecting wrist, hand and ankle fractures. Conclusions Preliminary experience with fracture detection and classification using AI shows promising performance. AI may enhance processing and communicating probabilistic tasks in medicine, including orthopaedic surgery. At present, inadequate reference standard assignments to train and test AI is the biggest hurdle before integration into clinical workflow. The next step will be to apply AI to more challenging diagnostic and therapeutic scenarios when there is absence of certitude. Future studies should also seek to address legal regulation and better determine feasibility of implementation in clinical practice. Level of Evidence Level II, diagnostic study.
Background Preliminary experience suggests that deep learning algorithms are nearly as good as humans in detecting common, displaced, and relatively obvious fractures (such as, distal radius or hip fractures). However, it is not known whether this also is true for subtle or relatively nondisplaced fractures that are often difficult to see on radiographs, such as scaphoid fractures. Questions/purposes (1) What is the diagnostic accuracy, sensitivity, and specificity of a deep learning algorithm in detecting radiographically visible and occult scaphoid fractures using four radiographic imaging views? (2) Does adding patient demographic (age and sex) information improve the diagnostic performance of the deep learning algorithm? (3) Are orthopaedic surgeons better at diagnostic accuracy, sensitivity, and specificity compared with deep learning? (4) What is the interobserver reliability among five human observers and between human consensus and deep learning algorithm? Methods We retrospectively searched the picture archiving and communication system (PACS) to identify 300 patients with a radiographic scaphoid series, until we had 150 fractures (127 visible on radiographs and 23 only visible on MRI) and 150 non-fractures with a corresponding CT or MRI as the reference standard for fracture diagnosis. At our institution, MRIs are usually ordered for patients with scaphoid tenderness and normal radiographs, and a CT with radiographically visible scaphoid fracture. We used a deep learning algorithm (a convolutional neural network [CNN]) for automated fracture detection on radiographs. Deep learning, an advanced subset of artificial intelligence, combines artificial neuronal layers to resemble a neuron cell. CNNs—essentially deep learning algorithms resembling interconnected neurons in the human brain—are most commonly used for image analysis. Area under the receiver operating characteristic curve (AUC) was used to evaluate the algorithm’s diagnostic performance. An AUC of 1.0 would indicate perfect prediction, whereas 0.5 would indicate that a prediction is no better than a flip of a coin. The probability of a scaphoid fracture generated by the CNN, sex, and age were included in a multivariable logistic regression to determine whether this would improve the algorithm’s diagnostic performance. Diagnostic performance characteristics (accuracy, sensitivity, and specificity) and reliability (kappa statistic) were calculated for the CNN and for the five orthopaedic surgeon observers in our study. Results The algorithm had an AUC of 0.77 (95% CI 0.66 to 0.85), 72% accuracy (95% CI 60% to 84%), 84% sensitivity (95% CI 0.74 to 0.94), and 60% specificity (95% CI 0.46 to 0.74). Adding age and sex did not improve diagnostic performance (AUC 0.81 [95% CI 0.73 to 0.89]). Orthopaedic surgeons had better specificity (0.93 [95% CI 0.93 to 0.99]; p < 0.01), while accuracy (84% [95% CI 81% to 88%]) and sensitivity (0.76 [95% CI 0.70 to 0.82]; p = 0.29) did not differ between the algorithm and human observers. Although the CNN was less specific in diagnosing relatively obvious fractures, it detected five of six occult scaphoid fractures that were missed by all human observers. The interobserver reliability among the five surgeons was substantial (Fleiss’ kappa = 0.74 [95% CI 0.66 to 0.83]), but the reliability between the algorithm and human observers was only fair (Cohen’s kappa = 0.34 [95% CI 0.17 to 0.50]). Conclusions Initial experience with our deep learning algorithm suggests that it has trouble identifying scaphoid fractures that are obvious to human observers. Thirteen false positive suggestions were made by the CNN, which were correctly detected by the five surgeons. Research with larger datasets—preferably also including information from physical examination—or further algorithm refinement is merited. Level of Evidence Level III, diagnostic study.
Our findings emphasize the importance of obtaining negative margins in patients with a good life expectancy, as lower recurrence rate can be attained at a not significant additional risk for reoperation, with a potential impact on survival. J. Surg. Oncol. 2016;114:237-245. © 2016 Wiley Periodicals, Inc.
Objectives: To develop an accurate machine learning (ML) predictive model incorporating patient, fracture, and trauma characteristics to identify individual patients at risk of an (occult) PMF. Methods: Databases of 2 studies including patients with TSFs from 2 Level 1 trauma centers were combined for analysis. Using ten-fold cross-validation, 4 supervised ML algorithms were trained in recognizing patterns associated with PMFs: (1) Bayes point machine; (2) support vector machine; (3) neural network; and (4) boosted decision tree. Performance of each ML algorithm was evaluated and compared based on (1) C-statistic; (2) calibration slope and intercept; and (3) Brier score. The best-performing ML algorithm was incorporated into an online open-access prediction tool. Results: Total data set included 263 patients, of which 28% had a PMF. Training of the Bayes point machine resulted in the best-performing prediction model reflected by good C-statistic, calibration slope, calibration intercept, and Brier score of 0.89, 1.02, −0.06, and 0.106, respectively. This prediction model was deployed as an open-access online prediction tool. Conclusion: A ML-based prediction model accurately predicted the probability of a (occult) PMF in patients with a TSF based on patient- and fracture-specific characteristics. This prediction model can guide surgeons in their diagnostic workup and preoperative planning. Further research is required to externally validate the model before implementation in clinical practice. Level of Evidence: Prognostic Level III. See Instructions for Authors for a complete description of levels of evidence.
Background: There is growing interest in measuring and improving patient experience. Machine learning–based natural language processing techniques may help identify instructive themes in online comments written by patients about their healthcare provider. Separating individual surgeon and orthopaedic office reviews, we analyzed themes that are discussed based on the rating category, the association with review length, the number of people posting more than one review for a surgeon or office, the mean number of reviews per rating category, and the difference in review tones. Methods: On Yelp.com, we collected 11,614 free-text reviews—together with a one- to five-star rating—of orthopaedic surgeons. Using natural language processing, we identified the most frequently occurring word combinations among rating categories. Themes were derived by categorizing word combinations. Dominant tones (emotional and language styles) were assessed by the IBM Watson Tone Analyzer. We calculated chi-square tests for linear trend and Spearman's rank correlation coefficients to assess differences among rating category. Results: For individual surgeons and orthopaedic offices, themes such as logistics, care and compassion, trust, recommendation, and customer service varied among rating categories. More positive reviews are shorter for individual surgeons and orthopaedic offices, while rating category was comparable among people posting more than one review for both groups. Tones of joy and confidence were associated with higher ratings. Sadness and tentative tones were associated with lower ratings. Discussion: For individual orthopaedic surgeons and orthopaedic offices, patient experience may be influenced mostly by the patient-clinician relationship. Training in more effective communication strategies may help improve self-reported patient experience.
Tumor resection followed by reconstruction with a proximal femoral endoprosthesis or an allograft-prosthesis composite are the two main alternatives for treatment of proximal femoral malignancies. This review describes the revision rate, implant survival, limb salvage rate, and function. Overall revision rates are high and reasons for failure differ between treatment modalities. Rate and reasons for amputation are comparable between both methods. Functional outcome was reasonable to good on average for both treatment modalities. Level of evidence: IV, systematic review and meta-analysis. K E Y W O R D S allograft, benign, cancer, femoral, femur, malignant, prosthesis, proximal, reconstruction, replacement, tumor SUPPORTING INFORMATION Additional supporting information may be found online in the Supporting Information section at the end of the article. How to cite this article: Janssen SJ, Langerhuizen DWG, Schwab JH, Bramer JAM. Outcome after reconstruction of proximal femoral tumors: A systematic review. J Surg Oncol.
Aims The aim of this study was to investigate whether intraoperative 3D fluoroscopic imaging outperforms dorsal tangential views in the detection of dorsal cortex screw penetration after volar plating of an intra-articular distal radial fracture, as identified on postoperative CT imaging. Methods A total of 165 prospectively enrolled patients who underwent volar plating for an intra-articular distal radial fracture were retrospectively evaluated to study three intraoperative imaging protocols: 1) standard 2D fluoroscopic imaging with anteroposterior (AP) and elevated lateral images (n = 55); 2) 2D fluoroscopic imaging with AP, lateral, and dorsal tangential views images (n = 50); and 3) 3D fluoroscopy (n = 60). Multiplanar reconstructions of postoperative CT scans served as the reference standard. Results In order to detect dorsal screw penetration, the sensitivity of dorsal tangential views was 39% with a negative predictive value (NPV) of 91% and an accuracy of 91%; compared with a sensitivity of 25% for 3D fluoroscopy with a NPV of 93% and an accuracy of 93%. On the postoperative CT scans, we found penetrating screws in: 1) 40% of patients in the 2D fluoroscopy group; 2) in 32% of those in the 2D fluoroscopy group with AP, lateral, and dorsal tangential views; and 3) in 25% of patients in the 3D fluoroscopy group. In all three groups, the second compartment was prone to penetration, while the postoperative incidence decreased when more advanced imaging was used. There were no penetrating screws in the third compartment (extensor pollicis longus groove) in the 3D fluoroscopy groups, and one in the dorsal tangential views group. Conclusion Advanced intraoperative imaging helps to identify screws which have penetrated the dorsal compartments of the wrist. However, based on diagnostic performance characteristics, one cannot conclude that 3D fluoroscopy outperforms dorsal tangential views when used for this purpose. Dorsal tangential views are sufficiently accurate to detect dorsal screw penetration, and arguably more efficacious than 3D fluoroscopy. Cite this article: Bone Joint J 2020;102-B(7):874–880.
Background For fracture care, radiographs and two-dimensional (2-D) and three-dimensional (3-D) CT are primarily used for preoperative planning and postoperative evaluation. Intraarticular distal radius fractures are technically challenging to treat, and meticulous preoperative planning is paramount to improve the patient’s outcome. Three-dimensionally printed handheld models might improve the surgeon’s interpretation of specific fracture characteristics and patterns preoperatively and could therefore be clinically valuable; however, the additional value of 3-D printed handheld models for fractures of the distal radius, a high-volume and commonly complex fracture due to its intraarticular configuration, has yet to be determined. Questions/purposes (1) Does the reliability of assessing specific fracture characteristics that guide surgical decision-making for distal radius fractures improve with 3-D printed handheld models? (2) Does surgeon agreement on the overall fracture classification improve with 3-D printed handheld models? (3) Does the surgeon’s confidence improve when assessing the overall fracture configuration with an additional 3-D model? Methods We consecutively included 20 intraarticular distal radius fractures treated at a Level 1 trauma center between May 2018 and November 2018. Ten surgeons evaluated the presence or absence of specific fracture characteristics (volar rim fracture, die punch, volar lunate facet, dorsal comminution, step-off > 2 mm, and gap > 2 mm), fracture classification according to the AO/Orthopaedic Trauma Association (OTA) classification scheme, and their confidence in assessing the overall fracture according to the classification scheme, rated on a scale from 0 to 10 (0 = not at all confident to 10 = very confident). Of 10 participants regularly treating distal radius fractures, seven were orthopaedic trauma surgeons and three upper limb surgeons with experience levels ranging from 1 to 25 years after completion of residency training. Fractures were assessed twice, with 1 month between each assessment. Initially, fractures were assessed using radiographs and 2-D and 3-D CT images (conventional assessment); the second time, the evaluation was based on radiographs and 2-D and 3-D CT images with an additional 3-D handheld model (3-D printed handheld model assessment). On both occasions, fracture characteristics were evaluated upon a surgeon’s own interpretation, without specific instruction before assessment. We provided a sheet demonstrating the AO/OTA classification scheme before evaluation on each session. Multi-rater Fleiss’s kappa was used to determine intersurgeon reliability for assessing fracture characteristics and classification. Confidence regarding assessment of the overall fracture classification was assessed using a paired t-test. Results We found that 3-D printed models of intraarticular distal radius fractures led to no change in kappa values for the reliability of all characteristics: volar rim (conventional kappa 0.19 [95% CI 0.06 to 0.32], kappa for 3-D handheld model 0.23 [95% CI 0.11 to 0.36], difference of kappas 0.04 [95% CI -0.14 to 0.22]; p = 0.66), die punch (conventional kappa 0.38 [95% CI 0.15 to 0.61], kappa for 3-D handheld model 0.50 [95% CI 0.23 to 0.78], difference of kappas 0.12 [95% CI -0.23 to 0.47]; p = 0.52), volar lunate facet (conventional kappa 0.31 [95% CI 0.14 to 0.49], kappa for 3-D handheld model 0.48 [95% CI 0.23 to 0.72], difference of kappas 0.17 [95% CI -0.12 to 0.46]; p = 0.26), dorsal comminution (conventional kappa 0.36 [95% CI 0.13 to 0.58], kappa for 3-D handheld model 0.31 [95% CI 0.11 to 0.51], difference of kappas -0.05 [95% CI -0.34 to 0.24]; p = 0.74), step-off > 2 mm (conventional kappa 0.55 [95% CI 0.29 to 0.82], kappa for 3-D handheld model 0.58 [95% CI 0.31 to 0.85], difference of kappas 0.03 [95% CI -0.34 to 0.40]; p = 0.87), gap > 2 mm (conventional kappa 0.59 [95% CI 0.39 to 0.79], kappa for 3-D handheld model 0.69 [95% CI 0.50 to 0.89], difference of kappas 0.10 [95% CI -0.17 to 0.37]; p = 0.48). Although there appeared to be categorical improvement in kappa values for some fracture characteristics, overlapping CIs indicated no change. Fracture classification did not improve (conventional diagnostics: kappa 0.27 [95% CI 0.14 to 0.39], conventional diagnostics with an additional 3-D handheld model: kappa 0.25 [95% CI 0.15 to 0.35], difference of kappas: -0.02 [95% CI -0.18 to 0.14]; p = 0.81). There was no improvement in self-assessed confidence in terms of assessment of overall fracture configuration when a 3-D model was added to the evaluation process (conventional diagnostics 7.8 [SD 0.79 {95% CI 7.2 to 8.3}], 3-D handheld model 8.5 [SD 0.71 {95% CI 8.0 to 9.0}], difference of score: 0.7 [95% CI -1.69 to 0.16], p = 0.09). Conclusions Intersurgeon reliability for evaluating the characteristics of and classifying intraarticular distal radius fractures did not improve with an additional 3-D model. Further studies should evaluate the added value of 3-D printed handheld models for teaching surgical residents and medical trainees to define the future role of 3-D printing in caring for fractures of the distal radius. Level of Evidence Level II, diagnostic study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.