IMPORTANCE When evaluating surgeons in the operating room, experienced physicians must rely on live or recorded video to assess the surgeon's technical performance, an approach prone to subjectivity and error. Owing to the large number of surgical procedures performed daily, it is infeasible to review every procedure; therefore, there is a tremendous loss of invaluable performance data that would otherwise be useful for improving surgical safety. OBJECTIVE To evaluate a framework for assessing surgical video clips by categorizing them based on the surgical step being performed and the level of the surgeon's competence. DESIGN, SETTING, AND PARTICIPANTS This quality improvement study assessed 103 video clips of 8 surgeons of various levels performing knot tying, suturing, and needle passing from the Johns Hopkins University-Intuitive Surgical Gesture and Skill Assessment Working Set. Data were collected before 2015, and data analysis took place from March to July 2019. MAIN OUTCOMES AND MEASURES Deep learning models were trained to estimate categorical outputs such as performance level (ie, novice, intermediate, and expert) and surgical actions (ie, knot tying, suturing, and needle passing). The efficacy of these models was measured using precision, recall, and model accuracy. RESULTS The provided architectures achieved accuracy in surgical action and performance calculation tasks using only video input. The embedding representation had a mean (root mean square error [RMSE]) precision of 1.00 (0) for suturing, 0.99 (0.01) for knot tying, and 0.91 (0.11) for needle passing, resulting in a mean (RMSE) precision of 0.97 (0.01). Its mean (RMSE) recall was 0.94 (0.08) for suturing, 1.00 (0) for knot tying, and 0.99 (0.01) for needle passing, resulting in a mean (RMSE) recall of 0.98 (0.01). It also estimated scores on the Objected Structured Assessment of Technical Skill Global Rating Scale categories, with a mean (RMSE) precision of 0.85 (0.09) for novice level, 0.67 (0.07) for intermediate level, and 0.79 (0.12) for expert level, resulting in a mean (RMSE) precision of 0.77 (0.04). Its mean (RMSE) recall was 0.85 (0.05) for novice level, 0.69 (0.14) for intermediate level, and 0.80 (0.13) for expert level, resulting in a mean (RMSE) recall of 0.78 (0.03). CONCLUSIONS AND RELEVANCE The proposed models and the accompanying results illustrate that deep machine learning can identify associations in surgical video clips. These are the first steps to creating a feedback mechanism for surgeons that would allow them to learn from their experiences and refine their skills.
Background Currently, evaluating surgical technical performance is inefficient and subjective [1,2,3,4] and the established rubrics for assessing surgical ability are open to interpretation. To power programs for surgical training and Maintenance of Certification (MOC), a reliable and validated solution is required. To this end, we draw upon recent advances in machine learning and propose a framework for objective and scalable assessment of technical proficiency.Methods Different machine learning models were trained to predict surgical performance on the public EndoVis19 and JIGSAWS datasets. The most important features were extracted by probing each machine learning model, and these features form the basis of the proposed algorithm. We internally tested the performance of this model on proprietary datasets from Surgical Safety Technologies (SST) and the University of Texas Southwestern (UTSW). The performance of these models was assessed according to various statistical techniques such as precision, recall, f1-scores and the area under the receiver operating characteristic curve (AUC). Results OR Vision is a statistically-driven multi-stage machine learning tool that quantifies surgical skill objectively and explainably. Instrument motion, control, and coordination are quantified in terms of 150 objective metrics, extracted from tool motion tracked by the deep learning model. The N most highly correlated of these metrics (p<0.05) model surgical performance with quantifiable objective metrics (fine-motor precision, fluidity, tremor, disorder, etc.). These metrics are combined into clinically-weighted composite scores that represent the category-wise technical performance of surgeons. The OR Vision score discriminates between expert and novice surgeons with high precision (0.82-0.84) and provides constructive feedback in the form of a concise report for every participating member of the cohort. Each report provides a breakdown of user performance on statistically relevant categories.ConclusionA machine learning-based approach for identifying surgical skill is effective and meaningful and provides the groundwork for objective, precise, repeatable, cost-effective, clinically-meaningful assessments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.