“…Volumetric fusion based methods [32,56,59,61,66] allow free-form dynamic reconstruction in a template-free, single-view, real-time way, through updating depth into the canonical model and performing non-rigid deformation. A series of works are proposed to make volumetric fusion more robust with SIFT features [17], human articulated skeleton prior [59,61], extra IMU sensors [66], data-driven prior [43], learned correspondences [3] or neural deformation graph [2]. Since these single-view setups suffer from tracking error in the occluded parts, multi-view setups are introduced to mitigate this problem with improved fusion methods.…”
4D modeling of human-object interactions is critical for numerous applications. However, efficient volumetric capture and rendering of complex interaction scenarios, especially from sparse inputs, remain challenging. In this paper, we propose NeuralFusion, a neural approach for volumetric human-object capture and rendering using sparse consumer RGBD sensors. It marries traditional non-rigid fusion with recent neural implicit modeling and blending advances, where the captured humans and objects are layerwise disentangled. For geometry modeling, we propose a neural implicit inference scheme with non-rigid key-volume fusion, as well as a template-aid robust object tracking pipeline. Our scheme enables detailed and complete geometry generation under complex interactions and occlusions. Moreover, we introduce a layer-wise human-object texture rendering scheme, which combines volumetric and image-based rendering in both spatial and temporal domains to obtain photo-realistic results. Extensive experiments demonstrate the effectiveness and efficiency of our approach in synthesizing photo-realistic free-view results under complex human-object interactions.
“…Volumetric fusion based methods [32,56,59,61,66] allow free-form dynamic reconstruction in a template-free, single-view, real-time way, through updating depth into the canonical model and performing non-rigid deformation. A series of works are proposed to make volumetric fusion more robust with SIFT features [17], human articulated skeleton prior [59,61], extra IMU sensors [66], data-driven prior [43], learned correspondences [3] or neural deformation graph [2]. Since these single-view setups suffer from tracking error in the occluded parts, multi-view setups are introduced to mitigate this problem with improved fusion methods.…”
4D modeling of human-object interactions is critical for numerous applications. However, efficient volumetric capture and rendering of complex interaction scenarios, especially from sparse inputs, remain challenging. In this paper, we propose NeuralFusion, a neural approach for volumetric human-object capture and rendering using sparse consumer RGBD sensors. It marries traditional non-rigid fusion with recent neural implicit modeling and blending advances, where the captured humans and objects are layerwise disentangled. For geometry modeling, we propose a neural implicit inference scheme with non-rigid key-volume fusion, as well as a template-aid robust object tracking pipeline. Our scheme enables detailed and complete geometry generation under complex interactions and occlusions. Moreover, we introduce a layer-wise human-object texture rendering scheme, which combines volumetric and image-based rendering in both spatial and temporal domains to obtain photo-realistic results. Extensive experiments demonstrate the effectiveness and efficiency of our approach in synthesizing photo-realistic free-view results under complex human-object interactions.
“…These works reason about the part-level geometry on the point cloud, which lacks the mesh information required for physical simulation. A series of methods on reconstructing deformable object [5,57,58] use articulated bones to represent articulation. These representations loosely constrain the motions of object parts.…”
Digitizing physical objects into the virtual world has the potential to unlock new research and applications in embodied AI and mixed reality. This work focuses on recreating interactive digital twins of real-world articulated objects, which can be directly imported into virtual environments. We introduce Ditto to learn articulation model estimation and 3D geometry reconstruction of an articulated object through interactive perception. Given a pair of visual observations of an articulated object before and after interaction, Ditto reconstructs part-level geometry and estimates the articulation model of the object. We employ implicit neural representations for joint geometry and articulation modeling. Our experiments show that Ditto effectively builds digital twins of articulated objects in a category-agnostic way. We also apply Ditto to real-world objects and deploy the recreated digital twins in physical simulation. Code and additional results are available at https://ut-austin-rpl.github.io/Ditto/
“…A limitation of all above methods is that the estimated 3D humans cannot be reposed, because implicit shapes (unlike statistical models) lack a consistent mesh topology, a skeleton and skinning weights. To address this, Bozic et al [13] infer an embedded deformation graph to manipulate implicit functions, while Yang et al [50] infer also a skeleton and skinning fields.…”
Figure 1. Images to avatars. ICON robustly reconstructs 3D clothed humans in unconstrained poses from individual video frames (Left). These are used to learn a fully textured and animatable clothed avatar with realistic clothing deformations (Right).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.