The estimation of the movements and posture of human beings is one of the key applications of consumer depth cameras. It motivated the development of the Kinect TM v1 and v2, and favored the diffusion of ToF and structured light technologies from the industrial and research fields to the mass market. The appeal of human pose estimation and tracking is due to its vast range of applications solving daily life tasks. Console games using the body or the hands as controller were the first commercial application of consumer depth cameras, and the skeletal tracking approach introduced with Kinect TM v1 represents the first reliable and efficient solution to the pose estimation problem in a home environment. Humancomputer interaction is another intriguing field, as the various hand configurations and body movements are often exploited to convey non-verbal information, either by explicitly associating gestures to specific meanings or more implicitly by augmenting speech information. Besides people interaction, human pose can also have a fundamental role in many situations requiring the manipulation of an object or the possibility of controlling a machine by performing intuitive (natural) movements, e.g., in the robotics field. Historically, computer animation has been one of the first and more active areas successfully exploiting human pose data derived from motion capture. Complex movements performed by a human actor can be tracked and recorded in order to be used either in real-time or in a second time, to drive the movements of some computer-generated character or avatar (motion retargeting). Finally, many other applications exploit information from human pose estimation and tracking, e.g., video surveillance and control, posture and movement analysis in medical applications and data compression through the use of representations more compact than full 3D point clouds.Various solutions have been proposed for human pose estimation and tracking task ( Fig. 8.1). Marker-based systems are able to acquire reliable information about body or hand posture but they are expensive and invasive, therefore their usage is confined to highly controlled industrial or medical environments. Colored gloves and special suits equipped with reflective or LED lights markers require delicate Even though the geometric 3D information embedded in depth data can solve some of the problems of systems based on a single color camera, such approach requires to properly solve a number of issues. For example, the depth data provided by many depth cameras are affected by a considerable amount of noise and artifacts. Moreover, single-view approaches based on 3D geometry often present a large number of self-occlusions, resulting in missing data. Deploying multiple depth cameras (whenever the mutual interference is negligible) reduces missing data at the expenses of more complex calibration procedures. Self-occlusions probably represent one of the most difficult problems within single view pose recovery, and as well known even multi-view acquisition setups...