Abstract:Recently, deep hashing dominated single label image retrieval approaches. However, the complex nature of remote sensing images, which likely contains multi-labels, hardly benefits from the above approaches. To overcome single-label image retrieval limitations in remote sensing domain, we address this problem by proposing a multi-label remote sensing image retrieval (MLRSIR-NET) framework. Specifically, the proposed MLRSIR-NET composed of two main sub-networks: multi-level feature extraction and deep hash. The … Show more
“…In order to verify the validity of the proposed model, we further compared the performance of our method with the existing methods. The method chosen for comparison include ResNet-50, Transformer [49], Swim-Transformer [50], FAH [34], FDRL [47] and MLRSIR-NET [48]. Swim-Transformer adopts a hierarchical structure for adapting images of different scales and implements a linear complexity attention computation using a sliding window approach to optimize the Transformer.…”
Section: ) Comparison Experiments With State-of-the-art Methodsmentioning
Observing clouds to understand the weather is a crucial method for people to forecast upcoming conditions. Utilizing content-based satellite cloud image retrieval allows for the swift discovery of comparable historical cloud images, significantly aiding meteorologists in their advanced investigations. Nevertheless, satellite cloud images often present complexities due to their inclusion of diverse cloud types, leading to inadequate retrieval outcomes when relying on conventionally employed single-label retrieval techniques. Despite notable accomplishments in cloud image retrieval applications utilizing deep neural networks, concerns regarding network interpretability undermine confidence in the model's deductive outcomes. This paper introduces the interpretable cloud image hash retrieval network (ICIHRN), a framework that employs a singular object-level global unit alongside multiple local feature units for the purpose of generating hash codes tailored to cloud image retrieval. Furthermore, an attention branching network is incorporated to enhance the model's focus on discriminative regions within the image. Additionally, a suppression module is implemented to progressively uncover complementary regions through the suppression of prominent areas in preceding layers and the amalgamation of relationships among activated regions. This ensures that each feature unit is endowed with distinctive semantic information, thereby imparting a level of interpretability to the retrieval outcomes. On this foundation, multi-label supervision is seamlessly integrated into the deep hash learning framework. This integration not only enhances the depiction of intricate semantic contents within cloud images but also boosts retrieval efficiency. Comprehensive experimental outcomes, grounded in the publicly accessible satellite cloud map dataset LSCIDMR-V2, demonstrate superior performance relative to other methods.
“…In order to verify the validity of the proposed model, we further compared the performance of our method with the existing methods. The method chosen for comparison include ResNet-50, Transformer [49], Swim-Transformer [50], FAH [34], FDRL [47] and MLRSIR-NET [48]. Swim-Transformer adopts a hierarchical structure for adapting images of different scales and implements a linear complexity attention computation using a sliding window approach to optimize the Transformer.…”
Section: ) Comparison Experiments With State-of-the-art Methodsmentioning
Observing clouds to understand the weather is a crucial method for people to forecast upcoming conditions. Utilizing content-based satellite cloud image retrieval allows for the swift discovery of comparable historical cloud images, significantly aiding meteorologists in their advanced investigations. Nevertheless, satellite cloud images often present complexities due to their inclusion of diverse cloud types, leading to inadequate retrieval outcomes when relying on conventionally employed single-label retrieval techniques. Despite notable accomplishments in cloud image retrieval applications utilizing deep neural networks, concerns regarding network interpretability undermine confidence in the model's deductive outcomes. This paper introduces the interpretable cloud image hash retrieval network (ICIHRN), a framework that employs a singular object-level global unit alongside multiple local feature units for the purpose of generating hash codes tailored to cloud image retrieval. Furthermore, an attention branching network is incorporated to enhance the model's focus on discriminative regions within the image. Additionally, a suppression module is implemented to progressively uncover complementary regions through the suppression of prominent areas in preceding layers and the amalgamation of relationships among activated regions. This ensures that each feature unit is endowed with distinctive semantic information, thereby imparting a level of interpretability to the retrieval outcomes. On this foundation, multi-label supervision is seamlessly integrated into the deep hash learning framework. This integration not only enhances the depiction of intricate semantic contents within cloud images but also boosts retrieval efficiency. Comprehensive experimental outcomes, grounded in the publicly accessible satellite cloud map dataset LSCIDMR-V2, demonstrate superior performance relative to other methods.
“…As it is identified that the single-labeled RSIR methods cannot meet the demand for flexibility and accuracy, and the image is often associated with several real-world concepts, multilabel hashing methods are experimented with. 14,17 A kernel-based feature fusion using supervised hashing is designed to express high-resolution RSI's highly complex geometrical structures and spatial patterns. 29 A hash retrieval strategy that combines hash learning with proxy-based metric learning in a convolutional neural network is presented.…”
Section: Related Workmentioning
confidence: 99%
“…This issue is rectified by the exceptional breakthrough of deep learning (DL) frameworks with the powerful representational ability for features extracted. As a result of the advancement of DL approaches, deep features are utilized in numerous RSIR tasks 14 – 18 …”
Section: Introductionmentioning
confidence: 99%
“…As a result of the advancement of DL approaches, deep features are utilized in numerous RSIR tasks. [14][15][16][17][18] The exhaustive matching and retrieval of similar images based on a visual query are computationally demanding for a huge volume RSI collection 1 using a computationally expensive similarity function. Also, the rapid growth of RSI volume poses a significant challenge regarding fast and efficient retrieval from massive RSI archives.…”
.Remote sensing image retrieval (RSIR) frameworks encounter several issues in practical scenarios with (1) the dominating repetitive features in the final image representation, (2) inter- and intra‐class variability across objects, and (3) time complexity in exhaustively searching the large-scale remote sensing image archives. Motivated by these facts, we propose a deep feature-splitting approach that enhances a localized hashing (DFS-LHash) model for RSIR. The DFS strategy splits the fully connected (FC) layer features into equally sized blocks for a split-based localized hash learning, keeping VGG-16 as the baseline network. We incorporate effective deep clustering to improve classification performance by reducing the center-cluster loss. A block-wise softmax is applied on each block to pay more attention to the essential features with reduced dimension for object localization, which enhances retrieval performance. The center-cluster loss applied strengthens the class discriminant information and minimizes the distance between the feature descriptors belonging to similar classes to a greater extent. The classification error from the DFS strategy is reduced to learn highly discriminant and localized binary codes effectively. The proposed model provides effective search with better retrieval time and achieves state-of-the-art performance with mean average precision values of 97.42%, 96.06%, and 93.47% for the University of California- Merced, PatternNet and aerial image datasets, respectively.
“…The availability of satellite instruments, the enormous amount of data acquired, and the availability of computational power has enabled a deeper neural network to introduce a new challenges in the earth science domain. 15,16 Recent advances in DL have demonstrated state-of-the-art results in pattern recognition tasks, mainly in image processing and speech recognition. 17,18 Modern convolutional neural network (CNN) architectures [19][20][21] tend to contain enormous hidden layers and millions of neurons, allowing them to concurrently learn hierarchical features for a broad class of patterns from data and achieve well-tailored models for the targeted application.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.