Jack Sim scite author profile

We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELF (DEep Local Feature). The new feature is based on convolutional neural networks, which are trained only with image-level annotations on a landmark image dataset. To identify semantically useful local features for image retrieval, we also propose an attention mechanism for keypoint selection, which shares most network layers with the descriptor. This framework can be used for image retrieval as a drop-in replacement for other keypoint detectors and descriptors, enabling more accurate feature matching and geometric verification. Our system produces reliable confidence scores to reject false positives-in particular, it is robust against queries that have no correct match in the database. To evaluate the proposed descriptor, we introduce a new large-scale dataset, referred to as Google-Landmarks dataset, which involves challenges in both database and query such as background clutter, partial occlusion, multiple landmarks, objects in variable scales, etc. We show that DELF outperforms the state-of-the-art global and local descriptors in the large-scale setting by significant margins. Code and dataset can be found at the project webpage: https://github.com/tensorflow/models/ tree/master/research/delf.

show abstract

Unifying Deep Local and Global Features for Image Search

Cao

2020

View full text Add to dashboard Cite

Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

et al. 2020

View full text Add to dashboard Cite

Computing Receptive Fields of Convolutional Neural Networks

Araujo

Norris

Sim

2019

Distill

209

103

View full text Add to dashboard Cite

Detect-To-Retrieve: Efficient Regional Aggregation for Image Search

et al. 2019

View full text Add to dashboard Cite

Retrieving object instances among cluttered scenes efficiently requires compact yet comprehensive regional image representations. Intuitively, object semantics can help build the index that focuses on the most relevant regions. However, due to the lack of bounding-box datasets for objects of interest among retrieval benchmarks, most recent work on regional representations has focused on either uniform or class-agnostic region selection. In this paper, we first fill the void by providing a new dataset of landmark bounding boxes, based on the Google Landmarks dataset, that includes 86k images with manually curated boxes from 15k unique landmarks. Then, we demonstrate how a trained landmark detector, using our new dataset, can be leveraged to index image regions and improve retrieval accuracy while being much more efficient than existing regional methods. In addition, we introduce a novel regional aggregated selective match kernel (R-ASMK) to effectively combine information from detected regions into an improved holistic image representation. R-ASMK boosts image retrieval accuracy substantially with no dimensionality increase, while even outperforming systems that index image regions independently. Our complete image retrieval system improves upon the previous state-of-the-art by significant margins on the Revisited Oxford and Paris datasets. Code and data available at the project webpage: https

show abstract

BranchOut: Regularization for Online Ensemble Tracking with Convolutional Neural Networks

2017

View full text Add to dashboard Cite

CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps

Seo

Weyand

Sim

et al. 2018

View full text Add to dashboard Cite

Image geolocalization is the task of identifying the location depicted in a photo based only on its visual information. This task is inherently challenging since many photos have only few, possibly ambiguous cues to their geolocation. Recent work has cast this task as a classification problem by partitioning the earth into a set of discrete cells that correspond to geographic regions. The granularity of this partitioning presents a critical trade-off; using fewer but larger cells results in lower location accuracy while using more but smaller cells reduces the number of training examples per class and increases model size, making the model prone to overfitting. To tackle this issue, we propose a simple but effective algorithm, combinatorial partitioning, which generates a large number of fine-grained output classes by intersecting multiple coarse-grained partitionings of the earth. Each classifier votes for the fine-grained classes that overlap with their respective coarse-grained ones. This technique allows us to predict locations at a fine scale while maintaining sufficient training examples per class. Our algorithm achieves the state-of-the-art performance in location recognition on multiple benchmark datasets.

show abstract

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food

Thames

Karpur

Norris

et al. 2021

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jack Sim

Large-Scale Image Retrieval with Attentive Deep Local Features

Unifying Deep Local and Global Features for Image Search

Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

Computing Receptive Fields of Convolutional Neural Networks

Detect-To-Retrieve: Efficient Regional Aggregation for Image Search

BranchOut: Regularization for Online Ensemble Tracking with Convolutional Neural Networks

CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food

Contact Info

Product

Resources

About