2016
DOI: 10.3169/mta.4.251
|View full text |Cite
|
Sign up to set email alerts
|

[Paper] Visual Instance Retrieval with Deep Convolutional Networks

Abstract: This paper provides an extensive study on the availability of image representations based on convolutional networks (ConvNets) for the task of visual instance retrieval. Besides the choice of convolutional layers, we present an efficient pipeline exploiting multi-scale schemes to extract local features, in particular, by taking geometric invariance into explicit account, i.e. positions, scales and spatial consistency. In our experiments using five standard image retrieval datasets, we demonstrate that generic … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
313
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 306 publications
(314 citation statements)
references
References 30 publications
(27 reference statements)
1
313
0
Order By: Relevance
“…The regional MAC descriptors are subsequently sum-pooled along with a series of normalization and PCA-whitening operations [53]. We also note in this survey that several other works [140], [133], [134] also employ similar ideas with [10] in employing max or average pooling on the intermediate feature maps and that Razavian et al [133] are the first. It has been observed that the last convolutional layer (e.g., pool5 in VGGNet), after pooling usually yields superior accuracy to the FC descriptors and the other convolutional layers [134].…”
Section: Feature Encoding and Poolingmentioning
confidence: 99%
See 4 more Smart Citations
“…The regional MAC descriptors are subsequently sum-pooled along with a series of normalization and PCA-whitening operations [53]. We also note in this survey that several other works [140], [133], [134] also employ similar ideas with [10] in employing max or average pooling on the intermediate feature maps and that Razavian et al [133] are the first. It has been observed that the last convolutional layer (e.g., pool5 in VGGNet), after pooling usually yields superior accuracy to the FC descriptors and the other convolutional layers [134].…”
Section: Feature Encoding and Poolingmentioning
confidence: 99%
“…Again, standard techniques in SIFT-based methods such as HE are employed [156]. Apart from the above-mentioned strategies, we notice that several works [7], [133], [152] extract several region descriptors per image to do a many-to-many matching, called "spatial search" [7]. This method improves the translation and scale invariance of the retrieval system but may encounter efficiency problems.…”
Section: Feature Encoding and Indexingmentioning
confidence: 99%
See 3 more Smart Citations