PEg TRAnsfer Workflow Recognition Challenge Report: Do Multi-Modal Data Improve Recognition?

Huaulmé, Arnaud; Harada, Kensuke; Nguyen, Quang‐Minh; Bogyu, Park,; Hong, Seungbum; Choi, Min-Kook; Peven, Michael; Li, Yunshuang; Long, Yonghao; Dou, Qi; Kumar, Satyadwyoom; Lalithkumar, Seenivasan; Ren, Hongliang; Matsuzaki, Hiroyuki; Ishikawa, Yoshimoto; Harai, Yuriko; Kondo, Satoshi; Mitsuishi, Mamoru; Jannin, Pierre

doi:10.2139/ssrn.4088403

Cited by 6 publications

(1 citation statement)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this study, we defined a vocabulary of nine elements to describe the surgical gestures involved in the performance of the two tasks: five gestures for PT and another four gestures for KT ( Table 1 ). This gesture decomposition was proposed in [ 18 ] and a similar definition of activities was utilized in [ 19 ]. The videos were annotated using Anvil 6.0 [ 20 ].…”

Section: Methodsmentioning

confidence: 99%

Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning

2022

View full text Add to dashboard Cite

In this study, we propose a deep learning framework and a self-supervision scheme for video-based surgical gesture recognition. The proposed framework is modular. First, a 3D convolutional network extracts feature vectors from video clips for encoding spatial and short-term temporal features. Second, the feature vectors are fed into a transformer network for capturing long-term temporal dependencies. Two main models are proposed, based on the backbone framework: C3DTrans (supervised) and SSC3DTrans (self-supervised). The dataset consisted of 80 videos from two basic laparoscopic tasks: peg transfer (PT) and knot tying (KT). To examine the potential of self-supervision, the models were trained on 60% and 100% of the annotated dataset. In addition, the best-performing model was evaluated on the JIGSAWS robotic surgery dataset. The best model (C3DTrans) achieves an accuracy of 88.0%, a 95.2% clip level, and 97.5% and 97.9% (gesture level), for PT and KT, respectively. The SSC3DTrans performed similar to C3DTrans when training on 60% of the annotated dataset (about 84% and 93% clip-level accuracies for PT and KT, respectively). The performance of C3DTrans on JIGSAWS was close to 76% accuracy, which was similar to or higher than prior techniques based on a single video stream, no additional video training, and online processing.

show abstract

Section: Methodsmentioning

confidence: 99%

Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning

2022

View full text Add to dashboard Cite

show abstract

AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided Surgical Automation in Laparoscopic Hysterectomy

Wang

Long

et al. 2022

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Computer-assisted minimally invasive surgery has great potential in benefiting modern operating theatres. The video data streamed from the endoscope provides rich information to support context-awareness for next-generation intelligent surgical systems. To achieve accurate perception and automatic manipulation during the procedure, learning based technique is a promising way, which enables advanced image analysis and scene understanding in recent years. However, learning such models highly relies on large-scale, high-quality, and multi-task labelled data. This is currently a bottleneck for the topic, as available public dataset is still extremely limited in the field of CAI. In this paper, we present and release the first integrated dataset (named AutoLaparo) with multiple image-based perception tasks to facilitate learning-based automation in hysterectomy surgery. Our AutoLaparo dataset is developed based on full-length videos of entire hysterectomy procedures. Specifically, three different yet highly correlated tasks are formulated in the dataset, including surgical workflow recognition, laparoscope motion prediction, and instrument and key anatomy segmentation. In addition, we provide experimental results with state-of-the-art models as reference benchmarks for further model developments and evaluations on this dataset. The dataset is available at https://autolaparo.github.io.

show abstract

Visual Modalities Based Multimodal Fusion for Surgical Phase Recognition

Bogyu¹,

Chi²,

Park³

et al. 2022

Multiscale Multimodal Medical Imaging

View full text Add to dashboard Cite

PEg TRAnsfer Workflow Recognition Challenge Report: Do Multi-Modal Data Improve Recognition?

Cited by 6 publications

References 13 publications

Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning

Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning

AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided Surgical Automation in Laparoscopic Hysterectomy

Visual Modalities Based Multimodal Fusion for Surgical Phase Recognition

Contact Info

Product

Resources

About