PEg TRAnsfer Workflow recognition challenge report: Does multi-modal data improve recognition?

Huaulmé, Arnaud; Harada, Kensuke; Nguyen, Quang‐Minh; Bogyu, Park,; Hong, Seungbum; Choi, Min-Kook; Peven, Michael; Li, Yunshuang; Long, Yonghao; Dou, Qi; Kumar, Satyadwyoom; Lalithkumar, Seenivasan; Ren, Hongliang; Matsuzaki, Hiroyuki; Ishikawa, Yoshimoto; Harai, Yuriko; Kondo, Satoshi; Mitsuishi, Mamoru; Jannin, Pierre

doi:10.48550/arxiv.2202.05821

Cited by 4 publications

(5 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that there are inconsistencies in label and granularity definitions across datasets. For example, the tasks of Suturing, Knot Tying, and Peg Transfer in JIGSAWS and DESK are considered phases in MISAW [31] and PETRAW [36]. [13] trained a GRU for gesture and maneuver recognition on the JIGSAWS and MISTIC-SL datasets, respectively.…”

Section: Related Workmentioning

confidence: 99%

“…Interestingly, [31] found that multi-granularity recognition models performed better because such models may be learning that certain activities only occur during specific phases and steps. Also, recent works on action triplet recognition in laparoscopic procedures focus on concurrent phase, step, and action recognition [36]. The poor performance of activity recognition models is a barrier to clinical applications, but understanding the relationship between granularity levels can address this challenge and guide model development.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Evaluating the Task Generalization of Temporal Convolutional Networks for Surgical Gesture and Motion Recognition Using Kinematic Data

Reyes

Alemzadeh

2023

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Fine-grained activity recognition enables explainable analysis of procedures for skill assessment, autonomy, and error detection in robot-assisted surgery. However, existing recognition models suffer from the limited availability of annotated datasets with both kinematic and video data and an inability to generalize to unseen subjects and tasks. Kinematic data from the surgical robot is particularly critical for safety monitoring and autonomy, as it is unaffected by common camera issues such as occlusions and lens contamination. We leverage an aggregated dataset of six dry-lab surgical tasks from a total of 28 subjects to train activity recognition models at the gesture and motion primitive (MP) levels and for separate robotic arms using only kinematic data. The models are evaluated using the LOUO (Leave-One-User-Out) and our proposed LOTO (Leave-One-Task-Out) cross validation methods to assess their ability to generalize to unseen users and tasks respectively. Gesture recognition models achieve higher accuracies and edit scores than MP recognition models. But, using MPs enables the training of models that can generalize better to unseen tasks. Also, higher MP recognition accuracy can be achieved by training separate models for the left and right robot arms. For task-generalization, MP recognition models perform best if trained on similar tasks and/or tasks from the same dataset.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Evaluating the Task Generalization of Temporal Convolutional Networks for Surgical Gesture and Motion Recognition Using Kinematic Data

Reyes

Alemzadeh

2023

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

show abstract

“…It has been suggested that surgical gesture recognition can be learned from optical flow data alone, highlighting the importance of motion cues for action recognition [27]. Combining modalities from different sensors continues to be an active topic in the medical domain [13], as data stemming from various sources become ubiquitous in the OR. Fusion strategies for RGB-D data are being explored at length for many applications such as depth estimation [16], 6DoF pose estimation [26,3], and object classification [29].…”

Section: Toolsmentioning

confidence: 99%

“…The analysis of surgical videos is no longer limited to medical devices such as endoscopic cameras -in the past years, several works have explored the use of ceiling-mounted cameras in an effort to understand OR workflows from an outside perspective. As the amount of data stemming from OR sensors increases, new questions arise, such as how to best integrate various modalities into automated surgical systems [13] or where to optimally place cameras for specific tasks [11,18]. This study seeks to understand which camera modalities are best suited for surgical action recognition, exploring their relative performance in a unique set of multi-view surgical recordings.…”

Section: Introductionmentioning

confidence: 99%

Know your sensORs -- A Modality Study For Surgical Action Classification

Bastian¹,

Czempiel²,

Heiliger³

et al. 2022

Preprint

View full text Add to dashboard Cite

The surgical operating room (OR) presents many opportunities for automation and optimization. Videos from various sources in the OR are becoming increasingly available. The medical community seeks to leverage this wealth of data to develop automated methods to advance interventional care, lower costs, and improve overall patient outcomes. Existing datasets from OR room cameras are thus far limited in size or modalities acquired, leaving it unclear which sensor modalities are best suited for tasks such as recognizing surgical action from videos. This study demonstrates that surgical action recognition performance can vary depending on the image modalities used. We perform a methodical analysis on several commonly available sensor modalities, presenting two fusion approaches that improve classification performance. The analyses are carried out on a set of multi-view RGB-D video recordings of 18 laparoscopic procedures.

show abstract

“…Recently, several research teams have worked on developing dataset at large scales [1,3,31], but most are only designed and annotated for one certain task. In terms of clinical applicability, data from different modalities are needed to better understand the whole scenario, make proper decisions, as well as enrich perception with multi-task learning strategy [9,16]. Besides, there are few datasets designed for automation tasks in surgical application, among which the automatic laparoscopic field-of-view (FoV) control is a popular topic as it can liberate the assistant from such tedious manipulations with the help from surgical robots [5].…”

Section: Introductionmentioning

confidence: 99%

AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided Surgical Automation in Laparoscopic Hysterectomy

Wang

Long

et al. 2022

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Computer-assisted minimally invasive surgery has great potential in benefiting modern operating theatres. The video data streamed from the endoscope provides rich information to support context-awareness for next-generation intelligent surgical systems. To achieve accurate perception and automatic manipulation during the procedure, learning based technique is a promising way, which enables advanced image analysis and scene understanding in recent years. However, learning such models highly relies on large-scale, high-quality, and multi-task labelled data. This is currently a bottleneck for the topic, as available public dataset is still extremely limited in the field of CAI. In this paper, we present and release the first integrated dataset (named AutoLaparo) with multiple image-based perception tasks to facilitate learning-based automation in hysterectomy surgery. Our AutoLaparo dataset is developed based on full-length videos of entire hysterectomy procedures. Specifically, three different yet highly correlated tasks are formulated in the dataset, including surgical workflow recognition, laparoscope motion prediction, and instrument and key anatomy segmentation. In addition, we provide experimental results with state-of-the-art models as reference benchmarks for further model developments and evaluations on this dataset. The dataset is available at https://autolaparo.github.io.

show abstract

PEg TRAnsfer Workflow recognition challenge report: Does multi-modal data improve recognition?

Cited by 4 publications

References 30 publications

Evaluating the Task Generalization of Temporal Convolutional Networks for Surgical Gesture and Motion Recognition Using Kinematic Data

Evaluating the Task Generalization of Temporal Convolutional Networks for Surgical Gesture and Motion Recognition Using Kinematic Data

Know your sensORs -- A Modality Study For Surgical Action Classification

AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided Surgical Automation in Laparoscopic Hysterectomy

Contact Info

Product

Resources

About