2021
DOI: 10.1007/978-3-030-87202-1_58
|View full text |Cite
|
Sign up to set email alerts
|

OperA: Attention-Regularized Transformers for Surgical Phase Recognition

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 60 publications
(37 citation statements)
references
References 16 publications
0
32
0
Order By: Relevance
“…Convolutional architectures such as ResNet-50 have been extensively used for phase segmentation of endoscopic videos. They serve as feature extraction backbones for many state-of-the-art recognition architectures [15,7,33]. Our baseline consists of a ResNet-50 pre-trained on ImageNet without temporal modeling.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Convolutional architectures such as ResNet-50 have been extensively used for phase segmentation of endoscopic videos. They serve as feature extraction backbones for many state-of-the-art recognition architectures [15,7,33]. Our baseline consists of a ResNet-50 pre-trained on ImageNet without temporal modeling.…”
Section: Methodsmentioning
confidence: 99%
“…Current stateof-the-art methods typically combine convolutional backbones with LSTM or attention-based temporal accumulators. These methods are particularly well suited for longer videos, as frequently seen in the surgical domain, where acquisitions span many hours [9,6,7].…”
Section: Related Workmentioning
confidence: 99%
“…Czempiel et al [14] proposed to replace the frequently used LSTMs with a multi-stage temporal convolution network (TCN) [15] analyzing the long temporal relationships more efficiently. Additionally, attention-based transformer architectures [16] have been proposed [17] [18] to refine the temporal context even further and increase model interpretability. Fair Evaluation One of the biggest challenges in this domain is the limited benchmarking between existing methods.…”
Section: Related Workmentioning
confidence: 99%
“…Here, a CNN is first trained on randomly sampled image batches, followed by a temporal model trained on the extracted visual features. Methods in this style have been proposed for phase recognition [9,10,16,59,62,63], duration prediction [2], tracking [39] or anticipation [61]. Most notably, TeCNO [9], a MS-TCN [13] trained on ResNet features, is the popular approach for 2-stage learning and Trans-SVNet [16], a 3-stage method which trains a Transformer model on TeCNO features, is the current state of the art in surgical phase recognition.…”
Section: Surgical Workflow Analysismentioning
confidence: 99%
“…BN-related issues can be avoided by using multi-stage training procedures where backbones are trained on randomly sampled image batches. While the majority of research in surgical workflow analysis [2,10,9,16,39,59,61,62,63] has opted for this strategy, it has several disadvantages. Firstly, it increases the number of hyperparameters since learning rate, number of epochs etc.…”
Section: Disadvantages Of Multi-stage Learningmentioning
confidence: 99%