2019
DOI: 10.1007/978-3-030-11015-4_27
|View full text |Cite
|
Sign up to set email alerts
|

Learning Structure-from-Motion from Motion

Abstract: This work is based on a questioning of the quality metrics used by deep neural networks performing depth prediction from a single image, and then of the usability of recently published works on unsupervised learning of depth from videos. These works are all predicting depth from a single image, thus it is only known up to an undetermined scale factor, which is not sufficient for practical use cases that need an absolute depth map, i.e. the determination of the scaling factor. To overcome these limitations, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(28 citation statements)
references
References 18 publications
0
28
0
Order By: Relevance
“…However, learning to estimate depth purely from video breaks the static scene assumption and necessitates the use of an attention mechanism for foreground motion between consecutive frames. More recent iterations of this direction added scale normalization and removal of the separate pose estimation branch [40], 3D geometric constraints between the pre-dicted depths [23], epipolar constraints [29], additional feature reconstruction supervision [44], stereo matching constraints [43] or explicitly used two consecutive frames as input [28].…”
Section: Related Workmentioning
confidence: 99%
“…However, learning to estimate depth purely from video breaks the static scene assumption and necessitates the use of an attention mechanism for foreground motion between consecutive frames. More recent iterations of this direction added scale normalization and removal of the separate pose estimation branch [40], 3D geometric constraints between the pre-dicted depths [23], epipolar constraints [29], additional feature reconstruction supervision [44], stereo matching constraints [43] or explicitly used two consecutive frames as input [28].…”
Section: Related Workmentioning
confidence: 99%
“…The 6 DoF of the camera are randomly changed within the limits of this sphere, in such a way that the end effector is moved within these limits while keeping the orientation of the camera unchanged, so that only the translational component will need to be input into the neural networks, reducing their complexity. This point differs significantly from the case proposed in [ 31 ] in which the camera is mounted on the drone and its speed is used for distance estimation. In addition, the total displacement values there are up to 30 cm.…”
Section: Methodsmentioning
confidence: 77%
“…This is not practical for most cases in robotics since an absolute depth map of the surrounding environment is needed. Pinard et al recently pointed out this issue [ 31 ], solving the problem by adding the velocity of the camera as an additional input. Although the concept is similar to the one proposed in this paper, it should be noted that both the dataset and the objective in [ 31 ] are different, since the objective is the estimation of the depth image from the point of view of a drone moving at a constant speed.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In this paper, we utilize the inferred depth information from the 2D image and relative position to achieve the robust BSD performance augmentation. The depth information is the key to recognizing the near and far areas [4,5]. In the field of computer vision, constructing a depth mapping from a single 2D image is a challenging task [6].…”
Section: Introductionmentioning
confidence: 99%