Joint estimation of the human body is suitable for many fields such as human–computer interaction, autonomous driving, video analysis and virtual reality. Although many depth-based researches have been classified and generalized in previous review or survey papers, the point cloud-based pose estimation of human body is still difficult due to the disorder and rotation invariance of the point cloud. In this review, we summarize the recent development on the point cloud-based pose estimation of the human body. The existing works are divided into three categories based on their working principles, including template-based method, feature-based method and machine learning-based method. Especially, the significant works are highlighted with a detailed introduction to analyze their characteristics and limitations. The widely used datasets in the field are summarized, and quantitative comparisons are provided for the representative methods. Moreover, this review helps further understand the pertinent applications in many frontier research directions. Finally, we conclude the challenges involved and problems to be solved in future researches.
Joint estimation of human body in point cloud is a key step for tracking human movements. In this work, we present a geometric method to achieve detection of the joints from a single-frame point cloud captured using a Time-of-Flight (ToF) camera. Three-dimensional (3D) human silhouette, as global feature of the single-frame point cloud, is extracted based on the pre-processed data, the angle and aspect ratio of the silhouette are subsequently utilized to perform pose recognition, and then 14 joints of human body are derived via geometric features of 3D silhouette. To verify this method, we test on an in-house captured 3D dataset containing 1200-frame depth images, which can be categorized into four different poses (upright, raising hands, parallel arms, and akimbo). Furthermore, we test on a subset of the G3D dataset. By hand-labelling the joints of each human body as the ground truth for validation and benchmarks, the average normalized error of our geometric method is less than 5.8 cm. When the distance threshold from the ground truth is 10 cm, the results demonstrate that our proposed method delivers improved performance with an average accuracy in the range of 90%. INDEX TERMS Depth camera, human pose detection, joint detection, sensor systems and applications.
Speech endpoint detection is one of the key problems in the practical application of speech recognition system. In this paper, speech signal contained chirp is decomposed into several intrinsic mode function (IMF) with the method of ensemble empirical mode decomposition (EEMD). At the same time, it eliminates the modal mix superposition phenomenon which usually comes out in processing speech signal with the algorithm of empirical mode decomposition (EMD). After that, selects IMFs contained major noise through the adaptive algorithm. Finally, the IMFs and speech signal contained chirp are input into the independent component analysis (ICA) and pure voice signal is separated out. The accuracy of speech endpoint detection can be improved in this way. The result shows that the new speech endpoint detection method proposed above is effective, and has strong anti-noises ability, especially suitable for the speech endpoint detection in low SNR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.