Two-stage Discriminative Re-ranking for Large-scale Landmark Retrieval

Yokoo, Shuhei; Ozaki, Kohei; Simo-Serra, Edgar; Iizuka, Satoshi

doi:10.48550/arxiv.2003.11211

Cited by 2 publications

(7 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also compare with models trained on the Landmarks-full Table 5: Baseline results (% mAP@100) for the GLDv2 retrieval task. The bottom three results were reported in [63]. .…”

Section: Comparing Training Datasetsmentioning

confidence: 99%

“…The Google Landmarks Dataset v2 training set presents a realistic crowdsourced setting with diverse types of images for each landmark: e.g., for a specific museum there may be outdoor images showing the building facade, but also indoor images of paintings and sculptures that are on display. Such diversity within a class may pose challenges to the training process, so we consider the pre-processing steps proposed in [63] in order to make each class more visually coherent. Within each class, each image is queried against all others by global descriptor similarity, followed by geometric verification of the top-100 most similar images using local features.…”

Section: Training Set Pre-processingmentioning

confidence: 99%

See 1 more Smart Citation

Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

Weyand

Araujo

Cao

et al. 2020

Preprint

View full text Add to dashboard Cite

While image retrieval and instance recognition techniques are progressing rapidly, there is a need for challenging datasets to accurately measure their performance -while posing novel challenges that are relevant for practical applications. We introduce the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of human-made and natural landmarks. GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels. Its test set consists of 118k images with ground truth annotations for both the retrieval and recognition tasks. The ground truth construction involved over 800 hours of human annotator work. Our new dataset has several challenging properties inspired by realworld applications that previous datasets did not consider: An extremely long-tailed class distribution, a large fraction of out-of-domain test photos and large intra-class variability. The dataset is sourced from Wikimedia Commons, the world's largest crowdsourced collection of landmark photos. We provide baseline results for both recognition and retrieval tasks based on state-of-the-art methods as well as competitive results from a public challenge. We further demonstrate the suitability of the dataset for transfer learning by showing that image embeddings trained on it achieve competitive retrieval performance on independent datasets. The dataset images, ground-truth and metric scoring code are available at https://github.com/cvdfoundation/google-landmark.

show abstract

“…We also compare with models trained on the Landmarks-full Table 5: Baseline results (% mAP@100) for the GLDv2 retrieval task. The bottom three results were reported in [63]. .…”

Section: Comparing Training Datasetsmentioning

confidence: 99%

Section: Training Set Pre-processingmentioning

confidence: 99%

Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

Weyand

Araujo

Cao

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…All models are pre-trained on ImageNet [39] and implemented in PyTorch [31]. For fair comparisons, we set a training environment similar to the those of compared studies [56,53,28,35]. We employ ResNet101 [18] as a backbone model.…”

Section: Implementation Detailsmentioning

confidence: 99%

“…We adopt the batch sampling of Yokoo et al [56] where mini-batch samples with similar aspect ratios are resized to a particular size. Here, we use a batch size of 64.…”

Section: Implementation Detailsmentioning

confidence: 99%

“…Fixed-size vs. group-size sampling Numerous studies have proposed methods for constructing batches according to image size for efficient training. For instance, Gordo et al [16], DELF [29], and Yokoo et al [56] employed different image sizes per batch for training instead of a single fixed size. We adopt the method of Yokoo et al, which constructs a batch with images of similar aspect ratio, so that the images can be resized to a size with an aspect ratio that is similar to their own.…”

Section: Cbam Vs Our Local Spatial Attentionmentioning

confidence: 99%

See 1 more Smart Citation

All the attention you need: Global-local, spatial-channel attention for image retrieval

Song¹,

Han²,

Avrithis³

2021

Preprint

View full text Add to dashboard Cite

We address representation learning for large-scale instance-level image retrieval. Apart from backbone, training pipelines and loss functions, popular approaches have focused on different spatial pooling and attention mechanisms, which are at the core of learning a powerful global image representation. There are different forms of attention according to the interaction of elements of the feature tensor (local and global) and the dimensions where it is applied (spatial and channel). Unfortunately, each study addresses only one or two forms of attention and applies it to different problems like classification, detection or retrieval.We present global-local attention module (GLAM), which is attached at the end of a backbone network and incorporates all four forms of attention: local and global, spatial and channel. We obtain a new feature tensor and, by spatial pooling, we learn a powerful embedding for image retrieval. Focusing on global descriptors, we provide empirical evidence of the interaction of all forms of attention and improve the state of the art on standard benchmarks.

show abstract

Two-stage Discriminative Re-ranking for Large-scale Landmark Retrieval

Cited by 2 publications

References 51 publications

Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

All the attention you need: Global-local, spatial-channel attention for image retrieval

Contact Info

Product

Resources

About