To achieve higher accuracy in machine learning tasks, very deep convolutional neural networks (CNNs) are designed recently. However, the large memory access of deep CNNs will lead to high power consumption. A variety of hardware-friendly compression methods have been proposed to reduce the data transfer bandwidth by exploiting the sparsity of feature maps. Most of them focus on designing a specialized encoding format to increase the compression ratio. Differently, we observe and exploit the sparsity distinction between activations in earlier and later layers to improve the compression ratio. We propose a novel hardware-friendly transform-based method named 1D-Discrete Cosine Transform on Channel dimension with Masks (DCT-CM), which intelligently combines DCT, masks, and a coding format to compress activations. The proposed algorithm achieves an average compression ratio of 2.9× (53% higher than the stateof-the-art transform-based feature map compression works) during inference on ResNet-50 with an 8-bit quantization scheme.
Human activities consist of multiple simple actions, and the temporal information benefit action recognition at all time scales. Considering energy information of human action as action similarity criterion, we present a temporal segmentation method which action videos are firstly segmented to atomic actions based on kinematics information of human skeleton, then the atomic action units are iteratively incorporated in meaningful group by considering similarity of energy information. And the key frames are located at sphere of maximum energy information. We tested our method on two challenging datasets and its performance is better than other state of the art methods.
Abstract. A local feature descriptor based on energy information is presented which combines kinetic energy, potential energy and the position information of 3D skeleton joints etc. These features conform to not only kinematics and biology of human action, but also the natural visual saliency for action recognition. The semantic features is obtained by the bag of word (BOW) based on k-means clustering. Finally, SVM based on kernel function is used to carry out human activity recognition. The experimental results show that the accuracy of human activity recognition based on low dimensional features is higher than several state-of-the-art algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.