Background and Objective: Context-aware computer-assisted surgical systems require real time automatic and accurate surgical workflow recognition. Since several years, video has been the most common modality used to develop such methods. With the democratization of robotic-assisted surgery and segmentation method, new modalities are now accessible, such as kinematics. Some previous works used these new modalities as input of their models, but the added value of these modalities is rarely studied. This paper presents the design and results of the "PEg TRAnsfert Workflow recognition" (PETRAW) challenge whose objective was to develop surgical workflow recognition methods based on one or several modalities in order to study their added value. Methods: The PETRAW challenge provided a data set of 150 peg transfer sequences performed on a virtual simulator. This data set was composed of videos, kinematics, semantic segmentation, and workflow annotations which described the sequences at three different granularity levels: phase, step, and activity. Five tasks were proposed to the participants: three of them were related to the recognition of all granularities with one of the available modalities, while the others addressed the recognition with a combination of modalities. Average application-dependent balanced accuracy (AD-Accuracy) was used as evaluation metric to take unbalanced classes into account and because it is more clinically relevant than a frame-by-frame score. Results: Seven teams participated in at least one task and four of them in all tasks. Best results are obtained with the use of the video and the kinematics data with an AD-Accuracy between 93% and 90% for the four teams who participated in all tasks. Conclusion:The improvement between video/kinematic-based methods and the uni-modality ones was significant for all of the teams. However, the difference in testing execution time between the video/kinematic-based and the kinematic-based methods has to be taken into consideration. Is it relevant to spend 20 to 200 times more computing time for less than 3% of improvement? The PETRAW data set is publicly available at www.synapse.org/PETRAW to encourage further research in surgical workflow recognition.
Robotic surgery has been proven to offer clear advantages during surgical procedures, however, one of the major limitations is obtaining haptic feedback. Since it is often challenging to devise a hardware solution with accurate force feedback, we propose the use of "visual cues" to infer forces from tissue deformation. Endoscopic video is a passive sensor that is freely available, in the sense that any minimally-invasive procedure already utilizes it. To this end, we employ deep learning to infer forces from video as an attractive low-cost and accurate alternative to typically complex and expensive hardware solutions. First, we demonstrate our approach in a phantom setting using the da Vinci Surgical System affixed with an OptoForce sensor. Second, we then validate our method on an ex vivo liver organ. Our method results in a mean absolute error of 0.814 N in the ex vivo study, suggesting that it may be a promising alternative to hardware based surgical force feedback in endoscopic procedures.
There are many realistic applications of activity recognition where the set of potential activity descriptions is combinatorially large. This makes end-to-end supervised training of a recognition system impractical as no training set is practically able to encompass the entire label set. In this paper, we present an approach to fine-grained recognition that models activities as compositions of dynamic action signatures. This compositional approach allows us to reframe fine-grained recognition as zero-shot activity recognition, where a detector is composed "on the fly" from simple first-principles state machines supported by deep-learned components. We evaluate our method on the Olympic Sports and UCF101 datasets, where our model establishes a new state of the art under multiple experimental paradigms. We also extend this method to form a unique framework for zero-shot joint segmentation and classification of activities in video and demonstrate the first results in zero- shot decoding of complex action sequences on a widely-used surgical dataset. Lastly, we show that we can use off-the-shelf object detectors to recognize activities in completely de-novo settings with no additional training.
Objectives: To compare noninvasive mobility sensor patient motion signature to direct observations by physicians and nurses. Design: Prospective, observational study. Setting: Academic hospital surgical ICU. Patients and Measurements: A total of 2,426 1-minute clips from six ICU patients (development dataset) and 4,824 1-minute clips from five patients (test dataset). Interventions: None. Main Results: Noninvasive mobility sensor achieved a minute-level accuracy of 94.2% (2,138/2,272) and an hour-level accuracy of 81.4% (70/86). Conclusions: The automated noninvasive mobility sensor system represents a significant departure from current manual measurement and reporting used in clinical care, lowering the burden of measurement and documentation on caregivers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.