2020
DOI: 10.48550/arxiv.2003.11211
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Two-stage Discriminative Re-ranking for Large-scale Landmark Retrieval

Abstract: We propose an efficient pipeline for large-scale landmark image retrieval that addresses the diversity of the dataset through two-stage discriminative re-ranking. Our approach is based on embedding the images in a feature-space using a convolutional neural network trained with a cosine softmax loss. Due to the variance of the images, which include extreme viewpoint changes such as having to retrieve images of the exterior of a landmark from images of the interior, this is very challenging for approaches based … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(7 citation statements)
references
References 51 publications
0
7
0
Order By: Relevance
“…We also compare with models trained on the Landmarks-full Table 5: Baseline results (% mAP@100) for the GLDv2 retrieval task. The bottom three results were reported in [63]. .…”
Section: Comparing Training Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…We also compare with models trained on the Landmarks-full Table 5: Baseline results (% mAP@100) for the GLDv2 retrieval task. The bottom three results were reported in [63]. .…”
Section: Comparing Training Datasetsmentioning
confidence: 99%
“…The Google Landmarks Dataset v2 training set presents a realistic crowdsourced setting with diverse types of images for each landmark: e.g., for a specific museum there may be outdoor images showing the building facade, but also indoor images of paintings and sculptures that are on display. Such diversity within a class may pose challenges to the training process, so we consider the pre-processing steps proposed in [63] in order to make each class more visually coherent. Within each class, each image is queried against all others by global descriptor similarity, followed by geometric verification of the top-100 most similar images using local features.…”
Section: Training Set Pre-processingmentioning
confidence: 99%
“…All models are pre-trained on ImageNet [39] and implemented in PyTorch [31]. For fair comparisons, we set a training environment similar to the those of compared studies [56,53,28,35]. We employ ResNet101 [18] as a backbone model.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…We adopt the batch sampling of Yokoo et al [56] where mini-batch samples with similar aspect ratios are resized to a particular size. Here, we use a batch size of 64.…”
Section: Implementation Detailsmentioning
confidence: 99%
See 1 more Smart Citation