2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.284
|View full text |Cite
|
Sign up to set email alerts
|

VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization

Abstract: Machine learning techniques, namely convolutional neural networks (CNN) and regression forests, have recently shown great promise in performing 6-DoF localization of monocular images. However, in most cases imagesequences, rather only single images, are readily available. To this extent, none of the proposed learning-based approaches exploit the valuable constraint of temporal smoothness, often leading to situations where the per-frame error is larger than the camera motion. In this paper we propose a recurren… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
219
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 224 publications
(219 citation statements)
references
References 22 publications
(29 reference statements)
0
219
0
Order By: Relevance
“…Meanwhile, convolutional neural network (CNN) is best‐suited for extracting both global and fine features of an object. Frameworks that combined CNN (encoding spatial information) and RNN (encoding temporal information) have achieved significant success in video prediction . Inspired by these studies, we developed a customized deep learning algorithm that integrated both CNN and RNN units to predict the spatial tumor distribution in a longitudinal imaging study, and evaluated the impact of the structural design on the predictive accuracy.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Meanwhile, convolutional neural network (CNN) is best‐suited for extracting both global and fine features of an object. Frameworks that combined CNN (encoding spatial information) and RNN (encoding temporal information) have achieved significant success in video prediction . Inspired by these studies, we developed a customized deep learning algorithm that integrated both CNN and RNN units to predict the spatial tumor distribution in a longitudinal imaging study, and evaluated the impact of the structural design on the predictive accuracy.…”
Section: Introductionmentioning
confidence: 99%
“…Frameworks that combined CNN (encoding spatial information) and RNN (encoding temporal information) have achieved significant success in video prediction. [21][22][23] Inspired by these studies, we developed a customized deep learning algorithm that integrated both CNN and RNN units to predict the spatial tumor distribution in a longitudinal imaging study, and evaluated the impact of the structural design on the predictive accuracy. Furthermore, we assessed the characteristics of the prediction including its timing, frequency, and spatial accuracy to prepare for its integration into the clinical workflow of ART.…”
Section: Introductionmentioning
confidence: 99%
“…All abovementioned methods estimate camera localization from single images. VidLoc [7] and MapNet [4] are closely related to our work. VidLoc [7] accepts video clips as input and adopts regular bidirectional LSTMs to model the sequence.…”
Section: Related Workmentioning
confidence: 88%
“…VidLoc [7] and MapNet [4] are closely related to our work. VidLoc [7] accepts video clips as input and adopts regular bidirectional LSTMs to model the sequence. Although LSTMs can partially enhance observations, it cannot remember historical knowledge for a long time [29], resulting in poor performance in processing long sequences.…”
Section: Related Workmentioning
confidence: 88%
See 1 more Smart Citation