With the availability of numerous high-resolution remote sensing images, remote sensing image scene classification has been widely used in various fields. Compared with the field of natural images, the insufficient number of labeled remote sensing images limits the performance of supervised scene classification, while unsupervised methods are difficult to meet the practical applications. Therefore, this paper proposes a semi-supervised remote sensing image scene classification method using generative adversarial networks. The proposed method introduces dense residual block, pre-trained Inception V3 networks, gating unit, pyramidal convolution, and spectral normalization into GANs to promote the semi-supervised classification performance. To be specific, the pre-trained Inception V3 network is introduced to extract semantic features to enhance the feature discriminant capability. The gating unit is utilized to capture the relationships among features. The pyramidal convolution is integrated into dense residual block to capture different levels of details to strengthen the feature representation capability. The spectral normalization is introduced to stabilize the GANs training to improve semi-supervised classification accuracy. Extensive experimental results on publicly available EuroSAT and UC Merced datasets show that the proposed method gains the highest overall accuracy, especially when only a few labeled samples are available.
In the field of object detection, feature pyramid network (FPN) can effectively extract multi-scale information. However, the majority of FPN-based methods suffer from a semantic gap between features of various sizes before feature fusion, which can lead to feature maps with significant aliasing. In this paper, we present a novel multi-scale semantic enhancement feature pyramid network (MSE-FPN) which consists of three effective modules: semantic enhancement module, semantic injection module, and gated channel guidance module to alleviate these problems. Specifically, inspired by the strong ability of the self-attention mechanism to model context, we propose a semantic enhancement module to model global context to obtain the global semantic information before feature fusion. Then we propose the semantic injection module to divide and merge global semantic information into feature maps at various scales to narrow the semantic gap between features at different scales and efficiently utilize the semantic information of high-level features. Finally, to mitigate feature aliasing caused by feature fusion, the gated channel guidance module selectively outputs crucial features via a gating unit. By replacing FPN with MSE-FPN in Faster R-CNN, our models achieve 39.4 and 41.2 Average precision (AP) using ResNet50 and ResNet101 as the backbone network respectively. When using ResNet-101-64x4d as the backbone, MSE-FPN achieved up to 43.4 AP. Our results demonstrate that replacing FPN with MSE-FPN significantly enhances the detection performance of state-of-the-art FPN-based detectors.
Image classification and recognition has a very wide range of applications in computer vision, which involves many fields, such as image retrieval, image analysis, and robot positioning. Especially with the rise of brain science and cognitive science research, as well as the increasing diversification of imaging means, three-dimensional image data mainly based on magnetic resonance image plays an increasingly important role in image classification and recognition, especially in medical image classification and recognition. However, due to the high dimensional characteristics of human magnetic resonance images, human readability is reduced. Therefore, classification and recognition of 3-dimensional images is still a challenge. In order to better extract local features from images and effectively use their spatial information, this paper improved the “feature bag” and “spatial pyramid matching” algorithms on the basis of 3D feature extraction algorithm and proposed an image classification framework based on 3D feature extraction algorithm. Firstly, the multiresolution “3D spatial pyramid” algorithm, the multiscale image segmentation and image representation method, and the SVM classifier and feature fusion method are described. Secondly, the gender information contained in the magnetic resonance images is classified and recognized on the three databases selected in the experiment. Experimental results show that this method can effectively utilize the spatial information of three-dimensional images and achieve satisfactory results in the classification and recognition of human magnetic resonance images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.