Deep Feedback Inverse Problem Solver

Chiu, Wei; Wang, Shenlong; Gu, Jiwei; Manivasagam, Sivabalan; Torralba, Antonio; Urtasun, Raquel

doi:10.1007/978-3-030-58558-7_14

Cited by 12 publications

(9 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We note that differences in the joint labeling schemes used by these monocular 3D methods and our evaluation set do not affect the quality of camera initialization we acquire via rigid alignment, as long as monocular 3D estimates for all views follow the same labeling scheme. Similar to prior work [26], each "neural optimizer step" is trained separately, and stop gradient is applied to all inputs. We used the same architecture across all experiments: L fully-connected 512-dimensional layers followed by a fully-connected 128-dimensional, all with selu nonlinearities [21], followed by a dense output of the size corresponding to the optimization space (flattened 3D pose and weak camera model parameters).…”

Section: Methodsmentioning

confidence: 99%

“…In this final stage, we train an optimizer as a neural network f θ to predict the optimal update dy i+1 to the current guess y i for the pose and cameras, similar to Ma et al [26] for solving inverse problems. Specifically, the update dy i+1 is computed from heatmap mixture parameters g, the current guess y i , the projections of the current guess onto each camera {π (c) (y i )} C c=0 , and the current value of the refinement loss (we omit the dependency of y i on θ in the first line for readability):…”

Section: Neural Optimizer -Stagementioning

confidence: 99%

See 1 more Smart Citation

MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision

Usman

Tagliasacchi

Saenko

et al. 2021

Preprint

View full text Add to dashboard Cite

Recently, huge strides were made in monocular and multi-view pose estimation with known camera parameters, whereas pose estimation from multiple cameras with unknown positions and orientations received much less attention. In this paper, we show how to train a neural model that can perform accurate 3D pose and camera estimation, takes into account joint location uncertainty due occlusion from multiple views, and requires only 2D keypoint data for training. Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines on the well-established Human3.6M dataset, as well as the more challenging in-the-wild Ski-Pose PTZ dataset with moving cameras. We provide an extensive ablation study separating the error due to the camera model, number of cameras, initialization, and image-space joint localization from the additional error introduced by our model.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Neural Optimizer -Stagementioning

confidence: 99%

MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision

Usman

Tagliasacchi

Saenko

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Learning camera pose optimization can be tackled by unrolling the optimizer for a fixed number of steps [21,52,54,83,91,92], computing implicit derivatives [13,15,18,34,68], or crafting losses to mimic optimization steps [88,89]. Multiple works have proposed to learn components of these optimizers [21,52,83], with added complexity and unclear generalization.…”

Section: Related Workmentioning

confidence: 99%

“…Fitting the optimizer to the data: Levenberg-Marquardt is a generic optimization algorithm that involves several heuristics, such as the choice of robust cost function ρ or of the damping factor λ. Past works on learned optimization employ deep networks to predict ρ [52], λ [52,83], or even the pose update δ [21,54], from the residuals and visual features. We argue that this can greatly impair the ability to generalize to new data distributions, as it ties the optimizer to the visual-semantic content of the training data.…”

Section: Direct Alignmentmentioning

confidence: 99%

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

Sarlin¹,

Unagar²,

Larsson³

et al. 2021

Preprint

View full text Add to dashboard Cite

Camera pose estimation in known scenes is a 3D geometry task recently tackled by multiple learning algorithms. Many regress precise geometric quantities, like poses or 3D points, from an input image. This either fails to generalize to new viewpoints or ties the model parameters to a specific scene. In this paper, we go Back to the Feature: we argue that deep networks should focus on learning robust and invariant visual features, while the geometric estimation should be left to principled algorithms. We introduce PixLoc, a sceneagnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model. Our approach is based on the direct alignment of multiscale deep features, casting camera localization as metric learning. PixLoc learns strong data priors by end-to-end training from pixels to pose and exhibits exceptional generalization to new scenes by separating model parameters and scene geometry. The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching by jointly refining keypoints and poses with little overhead. The code will be publicly available at github.com/cvg/pixloc.

show abstract

“…6 DoF pose estimation has a wide range of applications, including augmented reality and robot manipulation [21,22]. Recent progress in differentiable rendering has sparked interest in solving pose estimation via analysis-by-synthesis [4,19,31,48]. However, techniques built around differentiable rendering engines typically require a highquality watertight 3D model of the object for use in rendering.…”

Section: Introductionmentioning

confidence: 99%

INeRF: Inverting Neural Radiance Fields for Pose Estimation

Lin¹,

Florence²,

Barron³

et al. 2020

Preprint

View full text Add to dashboard Cite

t=180 t=90 t=0 t=270 Iterative Pose Estimation w/ NeRF Model Observed Image w/ Unknown Pose Pose Estimation Results: Overlaid NeRF Rendering and Observed Image t=0 t=90 t=180 t=270 Figure 1: We present iNeRF which performs pose estimation by inverting an optimized neural radiance field representation of a scene. The middle figure shows the trajectory of estimated poses (gray) and the ground truth pose (green) in iNeRF's iterative pose estimation procedure. By comparing the observed and rendered images, we perform gradient-based optimization to estimate the camera pose associated with the observed image. Click the image to play the video in a browser.

show abstract

Deep Feedback Inverse Problem Solver

Cited by 12 publications

References 54 publications

MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision

MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

INeRF: Inverting Neural Radiance Fields for Pose Estimation

Contact Info

Product

Resources

About