Background and study aims: Artificial intelligence based computer-aided polyp detection (CADe) systems receive regular updates and occasionally offer customizable detection thresholds which impact their performance, but little is known about these effects. This study aimed to compare the performance of different CADe systems on the same benchmark dataset.
Methods: 101 colonoscopy videos were used as benchmark. Each video frame with a visible polyp was manually annotated with bounding boxes resulting in 129,705 polyp images. The videos were then analyzed by three different CADe systems: two versions of GI-Genius, two detection types of EndoAID and the freely-available system EndoMind. The evaluation included an extensive analysis of sensitivity and false-positive rate, among others.
Results: EndoAID (Type A), the earlier version of GI-Genius and EndoMind detected all 93 polyps. Both the later version of GI-Genius and EndoAID (Type B) missed one polyp. The mean per-frame sensitivity was of 50.63% and 67.85% for the earlier and the latest version of GI-Genius, 65.60% and 52.95% for EndoAID (Type A and B), and 60.22% for EndoMind.
Conclusions: This study compares the performance of different CADe systems, different updates, and different configuration modes. This might help clinicians to select the most appropriate system for their specific needs.