Hongdong Li scite author profile

Human action-anticipation methods predict what is the future action by observing only a few portion of an action in progress. This is critical for applications where computers have to react to human actions as early as possible such as autonomous driving, human-robotic interaction, assistive robotics among others. In this paper, we present a method for human action anticipation by predicting the most plausible future human motion. We represent human motion using Dynamic Images [1] and make use of tailored loss functions to encourage a generative model to produce accurate future motion prediction. Our method outperforms the currently best performing action-anticipation methods by 4% on JHMDB-21, 5.2% on UT-Interaction and 5.1% on UCF 101-24 benchmarks.

show abstract

Globally-Optimal Inlier Set Maximisation for Camera Pose and Correspondence Estimation

Campbell

Petersson

Kneip

et al. 2020

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Estimating the 6-DoF pose of a camera from a single image relative to a 3D point-set is an important task for many computer vision applications. Perspective-n-point solvers are routinely used for camera pose estimation, but are contingent on the provision of good quality 2D-3D correspondences. However, finding cross-modality correspondences between 2D image points and a 3D point-set is non-trivial, particularly when only geometric information is known. Existing approaches to the simultaneous pose and correspondence problem use local optimisation, and are therefore unlikely to find the optimal solution without a good pose initialisation, or introduce restrictive assumptions. Since a large proportion of outliers and many local optima are common for this problem, we instead propose a robust and globally-optimal inlier set maximisation approach that jointly estimates the optimal camera pose and correspondences. Our approach employs branch-and-bound to search the 6D space of camera poses, guaranteeing global optimality without requiring a pose prior. The geometry of SE(3) is used to find novel upper and lower bounds on the number of inliers and local optimisation is integrated to accelerate convergence. The algorithm outperforms existing approaches on challenging synthetic and real datasets, reliably finding the global optimum, with a GPU implementation greatly reducing runtime.

show abstract

Superpixel Soup: Monocular Dense 3D Reconstruction of a Complex Dynamic Scene

Kumar

Dai

2021

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

This work addresses the task of dense 3D reconstruction of a complex dynamic scene from images. The prevailing idea to solve this task is composed of a sequence of steps and is dependent on the success of several pipelines in its execution [1]. To overcome such limitations with the existing algorithm, we propose a unified approach to solve this problem. We assume that a dynamic scene can be approximated by numerous piecewise planar surfaces, where each planar surface enjoys its own rigid motion, and the global change in the scene between two frames is as-rigid-as-possible (ARAP). Consequently, our model of a dynamic scene reduces to a soup of planar structures and rigid motion of these local planar structures. Using planar over-segmentation of the scene, we reduce this task to solving a "3D jigsaw puzzle" problem. Hence, the task boils down to correctly assemble each rigid piece to construct a 3D shape that complies with the geometry of the scene under the ARAP assumption. Further, we show that our approach provides an effective solution to the inherent scale-ambiguity in structure-from-motion under perspective projection. We provide extensive experimental results and evaluation on several benchmark datasets. Quantitative comparison with competing approaches shows state-of-the-art performance.Index Terms-Dense 3D reconstruction, perspective camera, as-rigid-as-possible, relative scale ambiguity, structure from motion. !

show abstract

Ground-Plane-Based Absolute Scale Estimation for Monocular Visual Odometry

Zhou

Dai

2020

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

Recovering absolute metric scale from a monocular camera is a challenging but highly desirable problem for monocular camera-based systems. By using different kinds of cues, various approaches have been proposed for scale estimation, such as camera height, object size etc. In this paper, firstly, we summarize different kinds of scale estimation approaches. Then, we propose a robust divide and conquer absolute scale estimation method based on the ground plane and camera height by analyzing the advantages and disadvantages of different approaches. By using the estimated scale, an effective scale correction strategy has been proposed to reduce the scale drift during the Monocular Visual Odometry (VO) estimation process. Finally, the effectiveness and robustness of the proposed method have been verified on both public and self-collected image sequences.

show abstract

TUSR-Net: Triple Unfolding Single Image Dehazing With Self-Regularization and Dual Feature to Pixel Attention

Song

Zhou

et al. 2023

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Revisiting Spatio-Angular Trade-off in Light Field Cameras and Extended Applications in Super-Resolution

Zhu

Guo

et al. 2021

IEEE Trans. Visual. Comput. Graphics

View full text Add to dashboard Cite

Full View Optical Flow Estimation Leveraged From Light Field Superpixel

Zhu

Sun

Zhang

et al. 2020

IEEE Trans. Comput. Imaging

View full text Add to dashboard Cite

Angular-Driven Feedback Restoration Networks for Imperfect Sketch Recognition

Wan

Chan

2021

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hongdong Li

Action Anticipation by Predicting Future Dynamic Images

Globally-Optimal Inlier Set Maximisation for Camera Pose and Correspondence Estimation

Superpixel Soup: Monocular Dense 3D Reconstruction of a Complex Dynamic Scene

Ground-Plane-Based Absolute Scale Estimation for Monocular Visual Odometry

TUSR-Net: Triple Unfolding Single Image Dehazing With Self-Regularization and Dual Feature to Pixel Attention

Revisiting Spatio-Angular Trade-off in Light Field Cameras and Extended Applications in Super-Resolution

Full View Optical Flow Estimation Leveraged From Light Field Superpixel

Angular-Driven Feedback Restoration Networks for Imperfect Sketch Recognition

Contact Info

Product

Resources

About