Recently, gait‐based age and gender recognition have attracted considerable attention in the fields of advertisement marketing and surveillance retrieval due to the unique advantage that gaits can be perceived at a long distance. Intuitively, age and gender can be recognised by observing people's static shape (e.g. different hairstyles between males and females) and dynamic motion (e.g. different walking velocities between the elderly and youth). However, most of the existing gait‐based age and gender recognition methods are based on Gait Energy Image (GEI), which loses the capability of explicitly modelling temporal dynamic information and is not robust to the multi‐view recognition that inevitably happens in a real application. Therefore, in this study, an Attention‐aware Spatio‐Temporal Learning (ASTL) framework is proposed, which employs a silhouette sequence as input to learn essential and invariable spatial‐temporal gait representations. More specifically, a Multi‐Scale Temporal Aggregation (MSTA) module provides an effective scheme for dynamic gait description by exploring and aggregating multi‐scale temporal interval information, which is a core supplement to spatial representation. Then, a Multiple Attention Aggregation (MAA) module is designed to help the network focus on the most discriminatory information along temporal, spatial and channel dimensions. Finally, a Multimodal Collaborative Learning (MCL) block gives full play to the advantages of different modal features through a multimodal cooperative learning strategy. The mean absolute error (MAE) for the age estimation and the correct classification rate (CCR) for the gender classification on OU‐MVLP achieve 6.68 years and 97%, respectively, demonstrating the superiority of the method. Ablation experiments and visualisation results also prove the effectiveness of the three individual modules in their framework.
Thanks to the development of depth sensors and pose estimation algorithms, skeleton-based action recognition has become prevalent in the computer vision community. Most of the existing works are based on spatio-temporal graph convolutional network frameworks, which learn and treat all spatial or temporal features equally, ignoring the interaction with channel dimension to explore different contributions of different spatio-temporal patterns along the channel direction and thus losing the ability to distinguish confusing actions with subtle differences. In this paper, an interactional channel excitation (ICE) module is proposed to explore discriminative spatio-temporal features of actions by adaptively recalibrating channel-wise pattern maps. More specifically, a channel-wise spatial excitation (CSE) is incorporated to capture the crucial body global structure patterns to excite the spatial-sensitive channels. A channel-wise temporal excitation (CTE) is designed to learn temporal inter-frame dynamics information to excite the temporal-sensitive channels. ICE enhances different backbones as a plug-and-play module. Furthermore, we systematically investigate the strategies of graph topology and argue that complementary information is necessary for sophisticated action description. Finally, together equipped with ICE, an interactional channel excited graph convolutional network with complementary topology (ICE-GCN) is proposed and evaluated on three large-scale datasets, NTU RGB+D 60, NTU RGB+D 120, and Kinetics-Skeleton. Extensive experimental results and ablation studies demonstrate that our method outperforms other SOTAs and proves the effectiveness of individual sub-modules. The code will be published at https://github.com/shuxiwang/ICE-GCN.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.