Exploring the relevance between images and their respective natural language descriptions, due to its paramount importance, is regarded as the next frontier in the general computer vision literature. Thus, recently several works have attempted to map visual attributes onto their corresponding textual tenor with certain success. However, this line of research has not been widespread in the remote sensing community. On this point, our contribution is three-pronged. First, we construct a new dataset for text-image matching tasks, termed TextRS, by collecting images from four well-known different scene datasets, namely AID, Merced, PatternNet, and NWPU datasets. Each image is annotated by five different sentences. All the five sentences were allocated by five people to evidence the diversity. Second, we put forth a novel Deep Bidirectional Triplet Network (DBTN) for text to image matching. Unlike traditional remote sensing image-to-image retrieval, our paradigm seeks to carry out the retrieval by matching text to image representations. To achieve that, we propose to learn a bidirectional triplet network, which is composed of Long Short Term Memory network (LSTM) and pre-trained Convolutional Neural Networks (CNNs) based on (EfficientNet-B2, ResNet-50, Inception-v3, and VGG16). Third, we top the proposed architecture with an average fusion strategy to fuse the features pertaining to the five image sentences, which enables learning of more robust embedding. The performances of the method expressed in terms Recall@K representing the presence of the relevant image among the top K retrieved images to the query text shows promising results as it yields 17.20%, 51.39%, and 73.02% for K = 1, 5, and 10, respectively.
Variety of feature selection methods have been developed in the literature, which can be classified into three main categories: filter, wrapper and hybrid approaches. Filter methods apply an independent test without involving any learning algorithm, while wrapper methods require a predetermined learning algorithm for feature subset evaluation. Filter and wrapper methods have their drawbacks and are complementary to each other. The filter approaches have low computational cost with insufficient reliability in classification while wrapper methods tend to have superior classification accuracy but require great computational effort. The methods proposed in this paper are bi-level dimensionality reduction methods that integrate filter method and feature extraction method with the aim to improve the classification performance of the features selected. In the two approaches proposed, in level 1 of dimensionality reduction, feature are selected based on mutual correlation and in level 2 selected features are used to extract features using PCA or LPP. To evaluate the performance of the proposed methods several experiments are conducted on standard datasets and the results obtained show superiority of the proposed methods over single level dimensionality reduction techniques (feature selection based on Mutual correlation, PCA and LPP).
Local invariant key point extraction has recently emerged as an attractive approach for detecting near duplicate images. Near duplicate images can be: (i) perceptually identical images (e.g. allowing for change in color balance, change in brightness, compression artifacts, contrast adjustment, rotation, cropping, filtering, scaling etc.), (ii) images of the same 3D scene (from different viewpoints). The requirements for identifying near duplicate images vary according to the application. In this paper we focus on image matching strategy that will assist in the detection of forged (copy-paste forgery) images. So far, no specific image matching strategy exists for this application. The state of the art methodologies tend to generate many false positives. In this paper we have introduced a novel matching strategy for pattern matching of key point distributions. Typical experiments conducted with real world images demonstrate success in near duplicate image retrieval for the application of digital image forensic. Proposed method outperforms some of the existing methods and is computationally efficient.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.