We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc. Finally we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes.The system runs at 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.
This is a book about the problem of vision. How is it that a torrent of data from a television camera, or from biological visual receptors, can be reduced to perceptions-the recognition of familiar objects and the concise description of unfamiliar ones? There is of course an immense literature in psychophysics 1 , neurophysiology and neuroanatomy that provides some answers in the case of biological systems (see Uttal (1981) for a taxonomy). For instance, the functioning of light-sensitive cells in mammalian vision is understood in some detail (Marks et al. 1964); and the elegant, orderly, spatial correspondence of feature detectors in the brain with the array of cells in the retina, is well known (Hubel and Wiesel 1968). There has also been much dialogue between psychophysics and neurophysiology/neuroanatomy. Examples are the discovery of spatial bandpass channels (Campbell and Robson 1968, Braddick et al. 1978), and understanding the perception of coloured light (Livingstone and Hubel 1984, Jameson and Hurvich 1961) and surface colour (Land 1983, Zeki 1983). These instances are but parts of a very large body of knowledge of biological vision. Over the last two decades, computers have introduced a new strand into the study of vision. The earliest work (Roberts, 1965) produced systems able to recognise simple objects and manipulate them in a controlled way (Ambler et al. 1975). These systems were, of course, vastly inferior to the biological systems studied by the psychophysicists, neuroanatomists 1 Psychophysics is the application of physical methods to the study of psychological properties. Visual psychophysics typically probes the mechanisms of human vision by noting a subject's perception of specially designed patterns, under controlled experimental conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.