“…In some cases, artificial or anatomical cues can be used as fiducial markers, to perform a point‐based match between the endoscopic organ and the virtual model, since they are visible both preoperatively and intraoperatively 8,33 . On the contrary, surface‐based methods focus on the intraoperative perspective rather than on preoperative data, because the surface is intraoperatively reconstructed directly on laparoscopic images and registered only at a later stage 37,38 . Finally, the volume‐based methodologies are the most complex ones, as they require an intraoperative imaging system in addition to the endoscope, to better locate the hidden structures 39 .…”
Introduction
The current study presents a deep learning framework to determine, in real‐time, position and rotation of a target organ from an endoscopic video. These inferred data are used to overlay the 3D model of patient's organ over its real counterpart. The resulting augmented video flow is streamed back to the surgeon as a support during laparoscopic robot‐assisted procedures.
Methods
This framework exploits semantic segmentation and, thereafter, two techniques, based on Convolutional Neural Networks and motion analysis, were used to infer the rotation.
Results
The segmentation shows optimal accuracies, with a mean IoU score greater than 80% in all tests. Different performance levels are obtained for rotation, depending on the surgical procedure.
Discussion
Even if the presented methodology has various degrees of precision depending on the testing scenario, this work sets the first step for the adoption of deep learning and augmented reality to generalise the automatic registration process.
“…In some cases, artificial or anatomical cues can be used as fiducial markers, to perform a point‐based match between the endoscopic organ and the virtual model, since they are visible both preoperatively and intraoperatively 8,33 . On the contrary, surface‐based methods focus on the intraoperative perspective rather than on preoperative data, because the surface is intraoperatively reconstructed directly on laparoscopic images and registered only at a later stage 37,38 . Finally, the volume‐based methodologies are the most complex ones, as they require an intraoperative imaging system in addition to the endoscope, to better locate the hidden structures 39 .…”
Introduction
The current study presents a deep learning framework to determine, in real‐time, position and rotation of a target organ from an endoscopic video. These inferred data are used to overlay the 3D model of patient's organ over its real counterpart. The resulting augmented video flow is streamed back to the surgeon as a support during laparoscopic robot‐assisted procedures.
Methods
This framework exploits semantic segmentation and, thereafter, two techniques, based on Convolutional Neural Networks and motion analysis, were used to infer the rotation.
Results
The segmentation shows optimal accuracies, with a mean IoU score greater than 80% in all tests. Different performance levels are obtained for rotation, depending on the surgical procedure.
Discussion
Even if the presented methodology has various degrees of precision depending on the testing scenario, this work sets the first step for the adoption of deep learning and augmented reality to generalise the automatic registration process.
“…It is a pity that their research was limited to static image recognition, unable to adapt to endoscope videoed in poor light or unknown depth scenes. Ozyoruk et al proposed an unsupervised monocular visual odometry and estimated depth to solve the problem of frequently changing lighting conditions and scale inconsistency between consecutive frames [ 17 ]. The algorithm was optimized by mixed loss functions, using spatial attention modules to instruct the network to focus on tissue areas.…”
Section: Application Of DL In Gastrointestinal Endoscopymentioning
“…Specular highlights in digital images commonly occur with discrete light sources. They present a serious problem in applications that rely on image processing and analysis, such as depth perception, localization, and 3D reconstruction (Tao et al, 2015;Ozyoruk et al, 2021). These highlights not only occlude important colors, textures, and features, but also act as additional features that may be falsely interpreted as characteristic of the scene.…”
Section: Introductionmentioning
confidence: 99%
“…These highlights also negatively effect the success of numerous MISD computer vision tasks. These tasks include providing a better depth perception, object recognition, motion tracking, 3D reconstruction, localisation, etc (Ozyoruk et al, 2021;Kac ¸maz et al, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…This makes it hard to learn and model such artefacts, which leads to a lack of ground truth data. Since data is a key component of any learning based approach, synthetic data has been developed, but is still not very realistic (Ozyoruk et al, 2021). In addition to that, with the limited real data in general in the medical field, a clear data shortage arises.…”
Video streams are utilised to guide minimally-invasive surgery and diagnostic procedures in a wide range of procedures, and many computer assisted techniques have been developed to automatically analyse them. These approaches can provide additional information to the surgeon such as lesion detection, instrument navigation, or anatomy 3D shape modeling. However, the necessary image features to recognise these patterns are not always reliably detected due to the presence of irregular light patterns such as specular highlight reflections. In this paper, we aim at removing specular highlights from endoscopic videos using machine learning. We propose using a temporal generative adversarial network (GAN) to inpaint the hidden anatomy under specularities, inferring its appearance spatially and from neighbouring frames where they are not present in the same location. This is achieved using in-vivo data of gastric endoscopy (Hyper-Kvasir) in a fully unsupervised manner that relies on automatic detection of specular highlights. System evaluations show significant improvements to traditional methods through direct comparison as well as other machine learning techniques through an ablation study that depicts the importance of the network's temporal and transfer learning components. The generalizability of our system to different surgical setups and procedures was also evaluated qualitatively on in-vivo data of gastric endoscopy and ex-vivo porcine data (SERV-CT, SCARED). We also assess the effect of our method in computer vision tasks that underpin 3D reconstruction and camera motion estimation, namely stereo disparity, optical flow, and sparse point feature matching. These are evaluated quantitatively and qualitatively and results show a positive effect of specular highlight inpainting on these tasks in a novel comprehensive analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.