We introduce a learning-based method to reconstruct objects acquired in a casual handheld scanning setting with a depth camera. Our method is based on two core components. First, a deep network that provides a semantic segmentation and labeling of the frames of an input RGBD sequence. Second, an alignment and reconstruction method that employs the semantic labeling to reconstruct the acquired object from the frames. We demonstrate that the use of a semantic labeling improves the reconstructions of the objects, when compared to methods that use only the depth information of the frames. Moreover, since training a deep network requires a large amount of labeled data, a key contribution of our work is an active self-learning framework to simplify the creation of the training data. Specifically, we iteratively predict the labeling of frames with the neural network, reconstruct the object from the labeled frames, and evaluate the confidence of the labeling, to incrementally train the neural network while requiring only a small amount of user-provided annotations. We show that this method enables the creation of data for training a neural network with high accuracy, while requiring only little manual effort.
In this paper, we study the problem of 3D shape upright orientation estimation from the perspective of reinforcement learning, i.e. we teach a machine (agent) to orientate 3D shapes step by step to upright given its current observation. Unlike previous methods, we take this problem as a sequential decision‐making process instead of a strong supervised learning problem. To achieve this, we propose UprightRL, a deep network architecture designed for upright orientation estimation. UprightRL mainly consists of two submodules: an Actor module and a Critic module which can be learned with a reinforcement learning manner. Specifically, the Actor module selects an action from the action space to perform a point cloud transformation and obtain the new point cloud for the next environment state, while the Critic module evaluates the strategy and guides the Actor to choose the next stage action. Moreover, we design a reward function that encourages the agent to select action which is conducive to orient model towards upright orientation with a positive reward and negative otherwise. We conducted extensive experiments to demonstrate the effectiveness of the proposed model, and experimental results show that our network outperforms the state‐of‐the‐art. We also apply our method to the robot grasping‐and‐placing experiment, to reveal the practicability of our method.
We introduce RPM-Net, a deep learning-based approach which simultaneously infers movable parts and hallucinates their motions from a single, un-segmented, and possibly partial, 3D point cloud shape. RPM-Net is a novel Recurrent Neural Network (RNN), composed of an encoder-decoder pair with interleaved Long Short-Term Memory (LSTM) components, which together predict a temporal sequence of pointwise displacements for the input point cloud. At the same time, the displacements allow the network to learn movable parts, resulting in a motion-based shape segmentation. Recursive applications of RPM-Net on the obtained parts can predict finer-level part motions, resulting in a hierarchical object segmentation. Furthermore, we develop a separate network to estimate part mobilities, e.g., per-part motion parameters, from the segmented motion sequence. Both networks learn deep predictive models from a training set that exemplifies a variety of mobilities for diverse objects. We show results of simultaneous motion and part predictions from synthetic and real scans of 3D objects exhibiting a variety of part mobilities, possibly involving multiple movable parts. 1 The subtle difference between part mobility and part motion is that the term mobility refers to the extent that a part can move; it emphasizes the geometric or transformation characteristics of part motions, e.g., the types of the motions and the reference point or axis of a rotation, etc. On the other hand, motion is a more general term which also encompasses measures reflecting physical properties, such as speed and acceleration. In our work, RPM-Net predicts movable parts and their motions simultaneously, while the second network, Mobility-Net, further predicts part mobilities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.