The method based on the two-stream networks has achieved great success in video action recognition. However, most existing methods employ the same structure for both spatial and temporal networks, leading to unsatisfied performance. In this paper, we propose a spatiotemporal heterogeneous two-stream network, which employs two different network structures for spatial and temporal information, respectively. Specifically, the Residual network (ResNet) and BN-Inception are utilized as the base networks to present the spatiotemporal characteristics of different human actions. In addition, a segmental architecture is employed to model long-range temporal structure over video sequences to better distinguish the similar actions owning sub-action sharing phenomenon. Moreover, combined with the strategy of data augment, a modified cross-modal pre-training strategy is proposed and applied to the spatiotemporal heterogeneous network to improve the final performance of human actions recognition. The experiments on UCF101 and HMDB51 datasets demonstrate the proposed spatiotemporal heterogeneous two-stream network outperforms the spatiotemporal isomorphic networks and other related methods.INDEX TERMS Action recognition, spatiotemporal heterogeneous, two-stream networks, ResNet, longrange temporal structure, training strategies.
The successful application of deep learning approaches in remote sensing image classification requires large hyperspectral image (HSI) datasets to learn discriminative spectral–spatial features simultaneously. To date, the HSI datasets available for image classification are relatively small to train deep learning methods. This study proposes a deep 3D/2D genome graph-based network (abbreviated as HybridGBN-SR) that is computationally efficient and not prone to overfitting even with extremely few training sample data. At the feature extraction level, the HybridGBN-SR utilizes the three-dimensional (3D) and two-dimensional (2D) Genoblocks trained using very few samples while improving HSI classification accuracy. The design of a Genoblock is based on a biological genome graph. From the experimental results, the study shows that our model achieves better classification accuracy than the compared state-of-the-art methods over the three publicly available HSI benchmarking datasets such as the Indian Pines (IP), the University of Pavia (UP), and the Salinas Scene (SA). For instance, using only 5% labeled data for training in IP, and 1% in UP and SA, the overall classification accuracy of the proposed HybridGBN-SR is 97.42%, 97.85%, and 99.34%, respectively, which is better than the compared state-of-the-art methods.
Recently developed hybrid models that stack 3D with 2D CNN in their structure have enjoyed high popularity due to their appealing performance in hyperspectral image classification tasks. On the other hand, biological genome graphs have demonstrated their effectiveness in enhancing the scalability and accuracy of genomic analysis. We propose an innovative deep genome graph-based network (GGBN) for hyperspectral image classification to tap the potential of hybrid models and genome graphs. The GGBN model utilizes 3D-CNN at the bottom layers and 2D-CNNs at the top layers to process spectral–spatial features vital to enhancing the scalability and accuracy of hyperspectral image classification. To verify the effectiveness of the GGBN model, we conducted classification experiments on Indian Pines (IP), University of Pavia (UP), and Salinas Scene (SA) datasets. Using only 5% of the labeled data for training over the SA, IP, and UP datasets, the classification accuracy of GGBN is 99.97%, 96.85%, and 99.74%, respectively, which is better than the compared state-of-the-art methods.
Developing complex hyperspectral image (HSI) sensors that capture high-resolution spatial information and voluminous (hundreds) spectral bands of the earth’s surface has made HSI pixel-wise classification a reality. The 3D-CNN has become the preferred HSI pixel-wise classification approach because of its ability to extract discriminative spectral and spatial information while maintaining data integrity. However, HSI datasets are characterized by high nonlinearity, voluminous spectral features, and limited training sample data. Therefore, developing deep HSI classification methods that purely utilize 3D-CNNs in their network structure often results in computationally expensive models prone to overfitting when the model depth increases. In this regard, this paper proposes an integrated deep multi-scale 3D/2D convolutional network block (MiCB) for simultaneous low-level spectral and high-level spatial feature extraction, which can optimally train on limited sample data. The strength of the proposed MiCB model solely lies in the innovative arrangement of convolution layers, giving the network the ability (i) to simultaneously convolve the low-level spectral with high-level spatial features; (ii) to use multiscale kernels to extract abundant contextual information; (iii) to apply residual connections to solve the degradation problem when the model depth increases beyond the threshold; and (iv) to utilize depthwise separable convolutions in its network structure to address the computational cost of the proposed MiCB model. We evaluate the efficacy of our proposed MiCB model using three publicly accessible HSI benchmarking datasets: Salinas Scene (SA), Indian Pines (IP), and the University of Pavia (UP). When trained on small amounts of training sample data, MiCB is better at classifying than the state-of-the-art methods used for comparison. For instance, the MiCB achieves a high overall classification accuracy of 97.35%, 98.29%, and 99.20% when trained on 5% IP, 1% UP, and 1% SA data, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.