Following the success of deep convolutional networks, state-of-the-art methods for 3d human pose estimation have focused on deep end-to-end systems that predict 3d joint locations given raw image pixels. Despite their excellent performance, it is often not easy to understand whether their remaining error stems from a limited 2d pose (visual) understanding, or from a failure to map 2d poses into 3dimensional positions.With the goal of understanding these sources of error, we set out to build a system that given 2d joint locations predicts 3d positions. Much to our surprise, we have found that, with current technology, "lifting" ground truth 2d joint locations to 3d space is a task that can be solved with a remarkably low error rate: a relatively simple deep feedforward network outperforms the best reported result by about 30% on Human3.6M, the largest publicly available 3d pose estimation benchmark. Furthermore, training our system on the output of an off-the-shelf state-of-the-art 2d detector (i.e., using images as input) yields state of the art results -this includes an array of systems that have been trained end-to-end specifically for this task. Our results indicate that a large portion of the error of modern deep 3d pose estimation systems stems from their visual analysis, and suggests directions to further advance the state of the art in 3d human pose estimation.
Abstract-We have previously developed a mobile robot system which uses scale-invariant visual landmarks to localize and simultaneously build three-dimensional (3-D) maps of unmodified environments. In this paper, we examine global localization, where the robot localizes itself globally, without any prior location estimate. This is achieved by matching distinctive visual landmarks in the current frame to a database map. A Hough transform approach and a RANSAC approach for global localization are compared, showing that RANSAC is much more efficient for matching specific features, but much worse for matching nonspecific features. Moreover, robust global localization can be achieved by matching a small submap of the local region built from multiple frames. This submap alignment algorithm for global localization can be applied to map building, which can be regarded as alignment of multiple 3-D submaps. A global minimization procedure is carried out using the loop closure constraint to avoid the effects of slippage and drift accumulation. Landmark uncertainty is taken into account in the submap alignment and the global minimization process. Experiments show that global localization can be achieved accurately using the scale-invariant landmarks. Our approach of pairwise submap alignment with backward correction in a consistent manner produces a better global 3-D map.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.