VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization

Clark, Ronald; Wang, Sen; Markham, Andrew; Trigoni, Niki; Wen, Hongkai

doi:10.1109/cvpr.2017.284

Cited by 224 publications

(219 citation statements)

References 22 publications

(29 reference statements)

Supporting

Mentioning

219

Contrasting

Order By: Relevance

“…Meanwhile, convolutional neural network (CNN) is best‐suited for extracting both global and fine features of an object. Frameworks that combined CNN (encoding spatial information) and RNN (encoding temporal information) have achieved significant success in video prediction . Inspired by these studies, we developed a customized deep learning algorithm that integrated both CNN and RNN units to predict the spatial tumor distribution in a longitudinal imaging study, and evaluated the impact of the structural design on the predictive accuracy.…”

Section: Introductionmentioning

confidence: 99%

“…Frameworks that combined CNN (encoding spatial information) and RNN (encoding temporal information) have achieved significant success in video prediction. [21][22][23] Inspired by these studies, we developed a customized deep learning algorithm that integrated both CNN and RNN units to predict the spatial tumor distribution in a longitudinal imaging study, and evaluated the impact of the structural design on the predictive accuracy. Furthermore, we assessed the characteristics of the prediction including its timing, frequency, and spatial accuracy to prepare for its integration into the clinical workflow of ART.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Toward predicting the evolution of lung tumors during radiotherapy observed on a longitudinal MR imaging study via a deep learning algorithm

Wang

Rimner

et al. 2019

Medical Physics

View full text Add to dashboard Cite

Purpose To predict the spatial and temporal trajectories of lung tumor during radiotherapy monitored under a longitudinal magnetic resonance imaging (MRI) study via a deep learning algorithm for facilitating adaptive radiotherapy (ART). Methods We monitored 10 lung cancer patients by acquiring weekly MRI‐T2w scans over a course of radiotherapy. Under an ART workflow, we developed a predictive neural network (P‐net) to predict the spatial distributions of tumors in the coming weeks utilizing images acquired earlier in the course. The three‐step P‐net consisted of a convolutional neural network to extract relevant features of the tumor and its environment, followed by a recurrence neural network constructed with gated recurrent units to analyze trajectories of tumor evolution in response to radiotherapy, and finally an attention model to weight the importance of weekly observations and produce the predictions. The performance of P‐net was measured with Dice and root mean square surface distance (RMSSD) between the algorithm‐predicted and experts‐contoured tumors under a leave‐one‐out scheme. Results Tumor shrinkage was 60% ± 27% (mean ± standard deviation) by the end of radiotherapy across nine patients. Using images from the first three weeks, P‐net predicted tumors on future weeks (4, 5, 6) with a Dice and RMSSD of (0.78 ± 0.22, 0.69 ± 0.24, 0.69 ± 0.26), and (2.1 ± 1.1 mm, 2.3 ± 0.8 mm, 2.6 ± 1.4 mm), respectively. Conclusion The proposed deep learning algorithm can capture and predict spatial and temporal patterns of tumor regression in a longitudinal imaging study. It closely follows the clinical workflow, and could facilitate the decision‐making of ART. A prospective study including more patients is warranted.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Toward predicting the evolution of lung tumors during radiotherapy observed on a longitudinal MR imaging study via a deep learning algorithm

Wang

Rimner

et al. 2019

Medical Physics

View full text Add to dashboard Cite

show abstract

“…All abovementioned methods estimate camera localization from single images. VidLoc [7] and MapNet [4] are closely related to our work. VidLoc [7] accepts video clips as input and adopts regular bidirectional LSTMs to model the sequence.…”

Section: Related Workmentioning

confidence: 88%

“…VidLoc [7] and MapNet [4] are closely related to our work. VidLoc [7] accepts video clips as input and adopts regular bidirectional LSTMs to model the sequence. Although LSTMs can partially enhance observations, it cannot remember historical knowledge for a long time [29], resulting in poor performance in processing long sequences.…”

Section: Related Workmentioning

confidence: 88%

See 1 more Smart Citation

Local Supports Global: Deep Camera Relocalization With Sequence Enhancement

Xue

Wang

Zhang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

We propose to leverage the local information in image sequences to support global camera relocalization. In contrast to previous methods that regress global poses from single images, we exploit the spatial-temporal consistency in sequential images to alleviate uncertainty due to visual ambiguities by incorporating a visual odometry (VO) component. Specifically, we introduce two effective steps called content-augmented pose estimation and motion-based refinement. The content-augmentation step focuses on alleviating the uncertainty of pose estimation by augmenting the observation based on the co-visibility in local maps built by the VO stream. Besides, the motion-based refinement is formulated as a pose graph, where the camera poses are further optimized by adopting relative poses provided by the VO component as additional motion constraints. Thus, the global consistency can be guaranteed. Experiments on the public indoor 7-Scenes and outdoor Oxford RobotCar benchmark datasets demonstrate that benefited from local information inherent in the sequence, our approach outperforms state-of-the-art methods, especially in some challenging cases, e.g., insufficient texture, highly repetitive textures, similar appearances, and over-exposure.

show abstract

Learning to Solve Nonlinear Least Squares for Monocular Stereo

Clark

Bloesch

Czarnowski

et al. 2018

Computer Vision – ECCV 2018

Self Cite

View full text Add to dashboard Cite

Sum-of-squares objective functions are very popular in computer vision algorithms. However, these objective functions are not always easy to optimize. The underlying assumptions made by solvers are often not satisfied and many problems are inherently ill-posed. In this paper, we propose LS-Net, a neural nonlinear least squares optimization algorithm which learns to effectively optimize these cost functions even in the presence of adversities. Unlike traditional approaches, the proposed solver requires no hand-crafted regularizers or priors as these are implicitly learned from the data. We apply our method to the problem of motion stereo ie. jointly estimating the motion and scene geometry from pairs of images of a monocular sequence. We show that our learned optimizer is able to efficiently and effectively solve this challenging optimization problem.

show abstract

VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization

Cited by 224 publications

References 22 publications

Toward predicting the evolution of lung tumors during radiotherapy observed on a longitudinal MR imaging study via a deep learning algorithm

Toward predicting the evolution of lung tumors during radiotherapy observed on a longitudinal MR imaging study via a deep learning algorithm

Local Supports Global: Deep Camera Relocalization With Sequence Enhancement

Learning to Solve Nonlinear Least Squares for Monocular Stereo

Contact Info

Product

Resources

About