2015
DOI: 10.48550/arxiv.1511.05879
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Particular object retrieval with integral max-pooling of CNN activations

Abstract: Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations. Yet such models are not compatible with geometry-aware re-ranking methods and still outperformed, on some particular object retrieval benchmarks, by traditional image search systems relying on precise descriptor matching, geometric re-ranking, or query expansion. This work revisits both retrieval stages, na… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
218
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 122 publications
(220 citation statements)
references
References 39 publications
(66 reference statements)
2
218
0
Order By: Relevance
“…global) features is particularly favoured in practice due to its run-time efficiency. In the era of deep learning, global features can be generated by aggregating CNNs features (Babenko and Lempitsky 2015;Tolias, Sicre, and Jégou 2015;Arandjelovic et al 2016;Gordo et al 2016;Radenović, Tolias, and Chum 2016;Tolias, Avrithis, and Jégou 2016;Noh et al 2017;Radenović, Tolias, and Chum 2018). Apart from global features, local features are also used to perform spatial verification (Philbin et al 2007;Noh et al 2017;Cao, Araujo, and Sim 2020), which incorporates the geometric information of objects and results in a more reliable matching.…”
Section: Related Workmentioning
confidence: 99%
“…global) features is particularly favoured in practice due to its run-time efficiency. In the era of deep learning, global features can be generated by aggregating CNNs features (Babenko and Lempitsky 2015;Tolias, Sicre, and Jégou 2015;Arandjelovic et al 2016;Gordo et al 2016;Radenović, Tolias, and Chum 2016;Tolias, Avrithis, and Jégou 2016;Noh et al 2017;Radenović, Tolias, and Chum 2018). Apart from global features, local features are also used to perform spatial verification (Philbin et al 2007;Noh et al 2017;Cao, Araujo, and Sim 2020), which incorporates the geometric information of objects and results in a more reliable matching.…”
Section: Related Workmentioning
confidence: 99%
“…Much like other applications [18] [19], the use of convolutional neural networks (CNNs), convolution auto-encoders (CAEs) and deep/shallow neural nets has demonstrated superior results for VPR than those achieved by handcrafted feature-based approaches. The use of neural networks was studied in [20] where a pre-trained CNN was used to extract features from layers of an input image.…”
Section: Literature Review a Vpr Techniquesmentioning
confidence: 99%
“…The design of CNN layer activations is also something that has been extensively studied, specifically pooling approaches that are employed on convolution layers. Such pooling approaches include; Max-Pooling [18], Cross-Pooling [19], Sum-Pooling [25] and Spatial Max-Pooling [26].…”
Section: Literature Review a Vpr Techniquesmentioning
confidence: 99%
“…The traditional hand-crafted features suffer large memory size consumption and will lower the searching efficiency [2,4]. Many compact image representations [5,6,7,8,9] are thus proposed to address the above problem. Among these studies, the deep features can produce better performance than the traditional Fisher compact features (such as work [5] and [6]), and they have become the predominant stream of features used for CBIR.…”
Section: Introductionmentioning
confidence: 99%
“…Among these studies, the deep features can produce better performance than the traditional Fisher compact features (such as work [5] and [6]), and they have become the predominant stream of features used for CBIR. Generally, the Convolutional Neural Networks (CNN) layer activations are directly adopted as the off-theshelf deep features [7,8,9]. However, these features are usually trained for image classification tasks.…”
Section: Introductionmentioning
confidence: 99%