Zhaoyang Lv scite author profile

Estimation of 3D motion in a dynamic scene from a temporal pair of images is a core task in many scene understanding problems. In real-world applications, a dynamic scene is commonly captured by a moving camera (i.e., panning, tilting or hand-held), increasing the task complexity because the scene is observed from di↵erent viewpoints. The primary challenge is the disambiguation of the camera motion from scene motion, which becomes more di cult as the amount of rigidity observed decreases, even with successful estimation of 2D image correspondences. Compared to other state-of-the-art 3D scene flow estimation methods, in this paper, we propose to learn the rigidity of a scene in a supervised manner from an extensive collection of dynamic scene data, and directly infer a rigidity mask from two sequential images with depths. With the learned network, we show how we can e↵ectively estimate camera motion and projected scene flow using computed 2D optical flow and the inferred rigidity mask. For training and testing the rigidity network, we also provide a new semi-synthetic dynamic scene dataset (synthetic foreground objects with a real background) and an evaluation split that accounts for the percentage of observed non-rigid pixels. Through our evaluation, we show the proposed framework outperforms current state-of-the-art scene flow estimation methods in challenging dynamic scenes.

show abstract

Neural 3D Video Synthesis from Multi-view Video

Slavcheva²,

Zollhoefer³

et al. 2022

132

View full text Add to dashboard Cite

Taking a Deeper Look at the Inverse Compositional Algorithm

Dellaert

Rehg

et al. 2019

View full text Add to dashboard Cite

In this paper, we provide a modern synthesis of the classic inverse compositional algorithm for dense image alignment. We first discuss the assumptions made by this wellestablished technique, and subsequently propose to relax these assumptions by incorporating data-driven priors into this model. More specifically, we unroll a robust version of the inverse compositional algorithm and replace multiple components of this algorithm using more expressive models whose parameters we train in an end-to-end fashion from data. Our experiments on several challenging 3D rigid motion estimation tasks demonstrate the advantages of combining optimization with learning-based techniques, outperforming the classic inverse compositional algorithm as well as data-driven image-to-pose regression approaches.1 The warping function W ξ : R 2 → R 2 might represent translation, affine 2D motion or (if depth is available) rigid or non-rigid 3D motion. To avoid clutter in the notation, we do not make W ξ explicit in our equations.

show abstract

A Continuous Optimization Approach for Efficient and Accurate Scene Flow

Beall

Alcantarilla

et al. 2016

View full text Add to dashboard Cite

We propose a continuous optimization method for solving dense 3D scene flow problems from stereo imagery. As in recent work, we represent the dynamic 3D scene as a collection of rigidly moving planar segments. The scene flow problem then becomes the joint estimation of pixel-to-segment assignment, 3D position, normal vector and rigid motion parameters for each segment, leading to a complex and expensive discrete-continuous optimization problem. In contrast, we propose a purely continuous formulation which can be solved more efficiently. Using a fine superpixel segmentation that is fixed a-priori, we propose a factor graph formulation that decomposes the problem into photometric, geometric, and smoothing constraints. We initialize the solution with a novel, high-quality initialization method, then independently refine the geometry and motion of the scene, and finally perform a global nonlinear refinement using Levenberg-Marquardt. We evaluate our method in the challenging KITTI Scene Flow benchmark, ranking in third position, while being 3 to 30 times faster than the top competitors (x37 [1] and x3.75 [2]).

show abstract

SENSE: A Shared Encoder Network for Scene-Flow Estimation

Jiang

Sun²,

Jampani³

et al. 2019

View full text Add to dashboard Cite

We introduce a compact network for holistic scene flow estimation, called SENSE, which shares common encoder features among four closely-related tasks: optical flow estimation, disparity estimation from stereo, occlusion estimation, and semantic segmentation. Our key insight is that sharing features makes the network more compact, induces better feature representations, and can better exploit interactions among these tasks to handle partially labeled data. With a shared encoder, we can flexibly add decoders for different tasks during training. This modular design leads to a compact and efficient model at inference time. Exploiting the interactions among these tasks allows us to introduce distillation and self-supervised losses in addition to supervised losses, which can better handle partially labeled realworld data. SENSE achieves state-of-the-art results on several optical flow benchmarks and runs as fast as networks specifically designed for optical flow. It also compares favorably against the state of the art on stereo and scene flow, while consuming much less memory.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhaoyang Lv

Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation

Neural 3D Video Synthesis from Multi-view Video

Taking a Deeper Look at the Inverse Compositional Algorithm

A Continuous Optimization Approach for Efficient and Accurate Scene Flow

SENSE: A Shared Encoder Network for Scene-Flow Estimation

Contact Info

Product

Resources

About