When the traditional semantic segmentation model is adopted, the different
feature importance of feature maps is ignored in the feature extraction
stage, which results in the detail loss, and affects the segmentation
effect. In this paper, we propose a BiSeNet-oriented context attention
model for image semantic segmentation. In the BiSeNet, the spatial path is
utilized to extract more low-level features to solve the problem of
information loss in deep network layers. Context attention mechanism is
used to mine high-level implied semantic features of images. Mean while, the
focus loss is used as the loss function to improve the final segmentation
effect by reducing the internal weighting. Finally, we conduct experiments
on open data sets, and the results show that pixel accuracy, average pixel
accuracy, and aver age Intersection-over-Union are greatly improved compared
with other state-of-the art semantic segmentation models. It effectively
improves the accuracy of feature extraction, reduces the loss of feature
details, and improves the final segmentation effect.
Remote sensing images contain important information such as airports, ports, and ships. By extracting remote sensing image features and learning the mapping relationship between image features and text semantic features, the interpretation and description of remote sensing image content can be realized, which has a wide range of application value in military and civil fields such as national defense security, land monitoring, urban planning, disaster mitigation and so on. Aiming at the complex background of remote sensing images and the lack of interpretability of existing target detection models, and the problems in feature extraction between different network structures, different layers, and the accuracy of target classification, we propose an object detection and interpretation model based on Gradient weighted class activation mapping and reinforcement learning. Firstly, ResNet is used as the main backbone network to extract the features of remote sensing images and generate feature graphs. Then, we add the global average pooling layer to obtain the corresponding feature weight vector of the feature graph. The weighted vectors are superimposed to output class activation maps. The reinforcement learning method is used to optimize the generated region generation network. At the same time, we improve the reward function of reinforcement learning to improve the effect of the region generation network. Finally, network dissecting analysis is used to obtain the interpretable semantic concept in the model. Through experiments, the average accuracy is more than 85%. Experimental results in the public remote sensing image description dataset show that the proposed method has high detection accuracy and good description performance for remote sensing images in complex environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.