In remotely sensed images, high intra-class variance and inter-class similarity are ubiquitous due to complex scenes and objects with multivariate features, making semantic segmentation a challenging task. Deep convolutional neural networks can solve this problem by modelling the context of features and improving their discriminability. However, current learning paradigms model the feature affinity in spatial dimension and channel dimension separately and then fuse them in a sequential or parallel manner, leading to suboptimal performance. In this study, we first analyze this problem practically and summarize it as attention bias that reduces the capability of network in distinguishing weak and discretely distributed objects from widerange objects with internal connectivity, when modeled only in spatial or channel domain. To jointly model both spatial and channel affinity, we design a synergistic attention module (SAM), which allows for channel-wise affinity extraction while preserving spatial details. In addition, we propose a synergistic attention perception neural network (SAPNet) for the semantic segmentation of remote sensing images. The hierarchicalembedded synergistic attention perception module aggregates SAM-refined features and decoded features. As a result, SAPNet enriches inference clues with desired spatial and channel details. Experiments on three benchmark datasets show that SAPNet is competitive in accuracy and adaptability compared with stateof-the-art methods. The experiments also validate the hypothesis of attention bias and the efficiency of SAM.
The locations and users' information can be shared and interacted in the IoV (Internet of Vehicles), which provides sufficient data for traffic deployment and behavior pattern analysis. However, privacy issues had become more severe since personal or sensitive information is inclined to be revealed in a big data environment. In this work, a novel differential privacybased algorithm named DPTD (Differentially Private Trajectory Database) is proposed for trajectory database releasing. Firstly, a 3-dimensional generalized trajectory dataset is established by considering the time factor. Then, the trajectory space is divided into several planes through the timestamps, and the set of the locations on each plane is further processed by clustering and generalizing to re-form new trajectories, that is, the trajectories to be released. This method is quite favorable to prefix-tree releasing because the spatiotemporal characteristics of the trajectories can be captured and spareness problem is fixed. Besides, a Markov assumption-based prediction method is suggested in order to reduce the cost of adding noise. Unlike the traditional method that the noise is added layer by layer, the noise is only added to the odd layers based on the prediction through spatiotemporal correlation, saving approximately 50% of the privacy budget. Theoretical analysis and experimental results show that the proposed algorithm has better data availability than the compared algorithms while guaranteeing the expected privacy level.
Contextual information plays a pivotal role in the semantic segmentation of remote sensing imagery (RSI) due to the imbalanced distributions and ubiquitous intra-class variants. The emergence of the transformer intrigues the revolution of vision tasks with its impressive scalability in establishing long-range dependencies. However, the local patterns, such as inherent structures and spatial details, are broken with the tokenization of the transformer. Therefore, the ICTNet is devised to confront the deficiencies mentioned above. Principally, ICTNet inherits the encoder–decoder architecture. First of all, Swin Transformer blocks (STBs) and convolution blocks (CBs) are deployed and interlaced, accompanied by encoded feature aggregation modules (EFAs) in the encoder stage. This design allows the network to learn the local patterns and distant dependencies and their interactions simultaneously. Moreover, multiple DUpsamplings (DUPs) followed by decoded feature aggregation modules (DFAs) form the decoder of ICTNet. Specifically, the transformation and upsampling loss are shrunken while recovering features. Together with the devised encoder and decoder, the well-rounded context is captured and contributes to the inference most. Extensive experiments are conducted on the ISPRS Vaihingen, Potsdam and DeepGlobe benchmarks. Quantitative and qualitative evaluations exhibit the competitive performance of ICTNet compared to mainstream and state-of-the-art methods. Additionally, the ablation study of DFA and DUP is implemented to validate the effects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.