Remote Sensing Image Retrieval remains a challenging topic due to the special nature of Remote Sensing imagery. Such images contain various different semantic objects, which clearly complicates the retrieval task. In this paper, we present an image retrieval pipeline that uses attentive, local convolutional features and aggregates them using the Vector of Locally Aggregated Descriptors (VLAD) to produce a global descriptor. We study various system parameters such as the multiplicative and additive attention mechanisms and descriptor dimensionality. We propose a query expansion method that requires no external inputs. Experiments demonstrate that even without training, the local convolutional features and global representation outperform other systems. After system tuning, we can achieve state-of-the-art or competitive results. Furthermore, we observe that our query expansion method increases overall system performance by about 3%, using only the top-three retrieved images. Finally, we show how dimensionality reduction produces compact descriptors with increased retrieval performance and fast retrieval computation times, e.g. 50% faster than the current systems.
Retrieving specific vehicle tracks by Natural Language (NL)-based descriptions is a convenient way to monitor vehicle movement patterns and traffic-related events. NLbased image retrieval has several applications in smart cities, traffic control, etc. In this work, we propose TIED, a text-to-image encoder-decoder model for the simultaneous extraction of visual and textual information for vehicle track retrieval. The model consists of an encoder network that enforces the two modalities into a common latent space and a decoder network that performs an inverse mapping to the text descriptions. The method exploits visual semantic attributes of a target vehicle along with a cycle-consistency loss. The proposed method employs both intra-modal and inter-modal relationships to improve retrieval performance.Our system yields competitive performance achieving the 7th position in the Natural Language-Based Vehicle Retrieval public track of the 2021 NVIDIA AI City Challenge. We demonstrate that the proposed TIED model obtains six times higher Mean Reciprocal Rank (MRR) than the baseline, achieving an MRR of 15.48. The code and models will be made publicly available.
published version features the final layout of the paper including the volume, issue and page numbers.
Link to publication
General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal.If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the "Taverne" license above, please follow below link for the End User
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.