Human Activity Recognition (HAR) is an important research area in human–computer interaction and pervasive computing. In recent years, many deep learning (DL) methods have been widely used for HAR, and due to their powerful automatic feature extraction capabilities, they achieve better recognition performance than traditional methods and are applicable to more general scenarios. However, the problem is that DL methods increase the computational cost of the system and take up more system resources while achieving higher recognition accuracy, which is more challenging for its operation in small memory terminal devices such as smartphones. So, we need to reduce the model size as much as possible while taking into account the recognition accuracy. To address this problem, we propose a multi-scale feature extraction fusion model combining Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU). The model uses different convolutional kernel sizes combined with GRU to accomplish the automatic extraction of different local features and long-term dependencies of the original data to obtain a richer feature representation. In addition, the proposed model uses separable convolution instead of classical convolution to meet the requirement of reducing model parameters while improving recognition accuracy. The accuracy of the proposed model is 97.18%, 96.71%, and 96.28% on the WISDM, UCI-HAR, and PAMAP2 datasets respectively. The experimental results show that the proposed model not only obtains higher recognition accuracy but also costs lower computational resources compared with other methods.
Human activity recognition (HAR) is one of the important research areas in pervasive computing. Among HAR, sensor-based activity recognition refers to acquiring a high-level knowledge about human activities from readings of many low-level sensor. In recent years, although traditional methods of deep learning (DL) have been widely used for sensor-based HAR with some good performance, they still face such challenges as feature extraction and characterization, continuous action segmentation in dealing with time series problems. In this study, a multichannel fusion model is proposed by the idea of dividing. In this proposed architecture, a multichannel convolutional neural network (CNN) is used to enhance the ability to extract features at different scales, and then the fused features are fed into gated recurrent unit (GRU) for feature labeling and enhanced feature representation, through the learning of temporal relationships. Finally, the multichannel CNN-GRU model is designed using global average pooling (GAP) to connect the feature maps with the final classification. The model performance was conducted on three benchmark datasets of WISDM, UCI-HAR, and PAMAP2 with the accuracy of 96.41%, 96.67%, and 96.25% respectively. The results show that the proposed model demonstrates better activity detection capability than some of the reported results.INDEX TERMS Human activity recognition, feature extraction, multichannel CNN, GRU
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.