Instrument detection, pose estimation, and tracking in surgical videos are an important vision component for computer-assisted interventions. While significant advances have been made in recent years, articulation detection is still a major challenge. In this paper, we propose a deep neural network for articulated multi-instrument 2-D pose estimation, which is trained on detailed annotations of endoscopic and microscopic data sets. Our model is formed by a fully convolutional detection-regression network. Joints and associations between joint pairs in our instrument model are located by the detection subnetwork and are subsequently refined through a regression subnetwork. Based on the output from the model, the poses of the instruments are inferred using maximum bipartite graph matching. Our estimation framework is powered by deep learning techniques without any direct kinematic information from a robot. Our framework is tested on single-instrument RMIT data, and also on multi-instrument EndoVis and in vivo data with promising results. In addition, the data set annotations are publicly released along with our code and model.
Methods for detecting and localizing surgical instruments in laparoscopic images are an important element of advanced robotic and computer-assisted interventions. Robotic joint encoders and sensors integrated or mounted on the instrument can provide information about the tool's position, but this often has inaccuracy when transferred to the surgeon's point of view. Vision sensors are currently a promising approach for determining the position of instruments in the coordinate frame of the surgical camera. In this study, we propose a vision algorithm for localizing the instrument's pose in 3-D leaving only rotation in the axis of the tool's shaft as an ambiguity. We propose a probabilistic supervised classification method to detect pixels in laparoscopic images that belong to surgical tools. We then use the classifier output to initialize an energy minimization algorithm for estimating the pose of a prior 3-D model of the instrument within a level set framework. We show that the proposed method is robust against noise using simulated data and we perform quantitative validation of the algorithm compared to ground truth obtained using an optical tracker. Finally, we demonstrate the practical application of the technique on in vivo data from minimally invasive surgery with traditional laparoscopic and robotic instruments.
Estimating the 3-D pose of instruments is an important part of robotic minimally invasive surgery for automation of basic procedures as well as providing safety features, such as virtual fixtures. Image-based methods of 3-D pose estimation provide a non-invasive low cost solution compared with methods that incorporate external tracking systems. In this paper, we extend our recent work in estimating rigid 3-D pose with silhouette and optical flow-based features to incorporate the articulated degrees-of-freedom (DOFs) of robotic instruments within a gradient-based optimization framework. Validation of the technique is provided with a calibrated ex-vivo study from the da Vinci Research Kit (DVRK) robotic system, where we perform quantitative analysis on the errors each DOF of our tracker. Additionally, we perform several detailed comparisons with recently published techniques that combine visual methods with kinematic data acquired from the joint encoders. Our experiments demonstrate that our method is competitively accurate while relying solely on image data.
PurposeComputer-assisted interventions for enhanced minimally invasive surgery (MIS) require tracking of the surgical instruments. Instrument tracking is a challenging problem in both conventional and robotic-assisted MIS, but vision-based approaches are a promising solution with minimal hardware integration requirements. However, vision-based methods suffer from drift, and in the case of occlusions, shadows and fast motion, they can be subject to complete tracking failure.MethodsIn this paper, we develop a 2D tracker based on a Generalized Hough Transform using SIFT features which can both handle complex environmental changes and recover from tracking failure. We use this to initialize a 3D tracker at each frame which enables us to recover 3D instrument pose over long sequences and even during occlusions.ResultsWe quantitatively validate our method in 2D and 3D with ex vivo data collected from a DVRK controller as well as providing qualitative validation on robotic-assisted in vivo data.ConclusionsWe demonstrate from our extended sequences that our method provides drift-free robust and accurate tracking. Our occlusion-based sequences additionally demonstrate that our method can recover from occlusion-based failure. In both cases, we show an improvement over using 3D tracking alone suggesting that combining 2D and 3D tracking is a promising solution to challenges in surgical instrument tracking.Electronic supplementary materialThe online version of this article (doi:10.1007/s11548-016-1393-4) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with đź’™ for researchers
Part of the Research Solutions Family.