Background:
Colonoscopy is the gold standard for polyp detection, but polyps may be missed. Artificial intelligence (AI) technologies may assist in polyp detection. To date, most studies for polyp detection have validated algorithms in ideal endoscopic conditions.
Aim:
To evaluate the performance of a deep-learning algorithm for polyp detection in a real-world setting of routine colonoscopy with variable bowel preparation quality.
Methods:
We performed a prospective, single-center study of 50 consecutive patients referred for colonoscopy. Procedural videos were analyzed by a validated deep-learning AI polyp detection software that labeled suspected polyps. Videos were then re-read by 5 experienced endoscopists to categorize all possible polyps identified by the endoscopist and/or AI, and to measure Boston Bowel Preparation Scale.
Results:
In total, 55 polyps were detected and removed by the endoscopist. The AI system identified 401 possible polyps. A total of 100 (24.9%) were categorized as “definite polyps;” 53/100 were identified and removed by the endoscopist. A total of 63 (15.6%) were categorized as “possible polyps” and were not removed by the endoscopist. In total, 238/401 were categorized as false positives. Two polyps identified by the endoscopist were missed by AI (false negatives). The sensitivity of AI for polyp detection was 98.8%, the positive predictive value was 40.6%. The polyp detection rate for the endoscopist was 62% versus 82% for the AI system. Mean segmental Boston Bowel Preparation Scale were similar (2.64, 2.59, P=0.47) for true and false positives, respectively.
Conclusions:
A deep-learning algorithm can function effectively to detect polyps in a prospectively collected series of colonoscopies, even in the setting of variable preparation quality.
Introduction
The occurrence of false positive (FP) alarms is an important outcome measure in computer-aided colon polyp detection (CADe) studies. However, there is no consensus definition for FPs in clinical trials evaluating CADe in colonoscopy. We aimed to study the diagnostic performance of CADe based on different threshold definitions for FP alarms.
Methods
A previously validated CADe system was applied to screening/surveillance colonoscopy videos. Different thresholds for FP alerts were defined based on the time an alarm box was continuously traced by the system. Primary outcomes were FP results and specificity using differing FP thresholds.
Results
62 colonoscopies were analyzed. CADe specificity and accuracy were 93.2% and 97.8% respectively for a FP threshold definition of > 0.5 seconds, 98.6 % and 99.5% for a FP threshold > 1 second, and 99.8% and 99.9% for a FP threshold > 2 seconds.
Conclusion
Our analysis demonstrates how different threshold definitions for false positives can impact the reported diagnostic performance of CADe for colon polyp detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.