Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+Dbased action recognition benchmarks have a number of limitations, including the lack of training samples, distinct class labels, camera views and variety of subjects. In this paper we introduce a large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects. Our dataset contains 60 different action classes including daily, mutual, and health-related actions. In addition, we propose a new recurrent neural network structure to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification. Experimental results show the advantages of applying deep learning methods over state-of-the-art handcrafted features on the suggested cross-subject and crossview evaluation criteria for our dataset. The introduction of this large scale dataset will enable the community to apply, develop and adapt various data-hungry learning techniques for the task of depth-based and RGB+D-based human activity analysis.
The increasing photorealism for computer graphics has made computer graphics a convincing form of image forgery. Therefore, classifying photographic images and photorealistic computer graphics has become an important problem for image forgery detection. In this paper, we propose a new geometrybased image model, motivated by the physical image generation process, to tackle the above-mentioned problem. The proposed model reveals certain physical differences between the two image categories, such as the gamma correction in photographic images and the sharp structures in computer graphics. For the problem of image forgery detection, we propose two levels of image authenticity definition, i.e., imaging-process authenticity and scene authenticity, and analyze our technique against these definitions. Such definition is important for making the concept of image authenticity computable. Apart from offering physical insights, our technique with a classification accuracy of 83.5% outperforms those in the prior work, i.e., wavelet features at 80.3% and cartoon features at 71.0%. We also consider a recapturing attack scenario and propose a counter-attack measure. In addition, we constructed a publicly available benchmark dataset with images of diverse content and computer graphics of high photorealism.
Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.