Two-stream network architecture has the ability to capture temporal and spatial features from videos simultaneously and has achieved excellent performance on video action recognition tasks. However, there is a fair amount of redundant information in both temporal and spatial dimensions in videos, which increases the complexity of network learning. To solve this problem, we propose residual spatial-temporal attention network (R-STAN), a feed-forward convolutional neural network using residual learning and spatial-temporal attention mechanism for video action recognition, which makes the network focus more on discriminative temporal and spatial features. In our R-STAN, each stream is constructed by stacking residual spatial-temporal attention blocks (R-STAB), the spatial-temporal attention modules integrated in the residual blocks have the ability to generate attention-aware features along temporal and spatial dimensions, which largely reduce the redundant information. Together with the specific characteristic of residual learning, we are able to construct a very deep network for learning spatial-temporal information in videos. With the layers going deeper, the attention-aware features from the different R-STABs can change adaptively. We validate our R-STAN through a large number of experiments on UCF101 and HMDB51 datasets. Our experiments show that our proposed network combined with residual learning and spatial-temporal attention mechanism contributes substantially to the performance of video action recognition.
Hard exudates are the main symptom of diabetic retinopathy. Early detection of hard exudates can help reduce the risk of blinding. However, hard exudate detection is a challenging task due to their various sizes and intensities, which cause the misdetection. Aiming at the problem of low accuracy and low efficiency for most of existing hard exudate detection methods, we propose an Enhanced Multi-feature Fusion Network (EMFN) to lessen the burden of ophthalmologists and detect hard exudates more accurately and efficiently. Our network belongs to the category of Convolutional Neural Network (CNN), which adopts and fuses multiple input features with enhanced structures and detailed information as the input, that provide significant help in improving the performance of EMFN. Besides, we introduce attention mechanism and construct the Residual Attention Module (RAM), which is designed by integrating spatial and channel-wise attention modules after each residual block. With the RAM integrated in our network, the EMFN has the ability to suppress redundancy, enhance target-related information, and leverage the correlation between different channels and their locations. Compared with previous methods, EMFN can avoid many processing steps and reduce the impact of subjective factors. We evaluate our EMFN on the MESSIDOR, HEI-MED and E-Ophtha EX dataset, and the experiment results demonstrate that it can achieve better performance than most of the existing methods.INDEX TERMS Hard exudate detection, convolutional neural network, multiple features fusion, attention mechanism.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.