Typical retrieval systems have three requirements: a) Accurate retrieval i.e., the method should have high precision, b) Diverse retrieval, i.e., the obtained set of points should be diverse, c) Retrieval time should be small. However, most of the existing methods address only one or two of the above mentioned requirements. In this work, we present a method based on randomized locality sensitive hashing which tries to address all of the above requirements simultaneously. While earlier hashing approaches considered approximate retrieval to be acceptable only for the sake of efficiency, we argue that one can further exploit approximate retrieval to provide impressive trade-offs between accuracy and diversity. We extend our method to the problem of multi-label prediction, where the goal is to output a diverse and accurate set of labels for a given document in real-time. Moreover, we introduce a new notion to simultaneously evaluate a method's performance for both the precision and diversity measures. Finally, we present empirical results on several different retrieval tasks and show that our method retrieves diverse and accurate images/labels while ensuring 100x-speed-up over the existing diverse retrieval approaches.
Instance retrieval (IR) is the problem of retrieving specific instances of a particular object, like a monument, from a collection of images. Currently, the most popular methods for IR use Bag of words (BoW) features for retrieval. However, a prominent problem for IR remains the tendency of BoW based methods to retrieve near-identical images as most relevant results. In this paper, we define diversity in IR as variation of physical properties among most relevant retrieved results for a query image. To achieve this, we propose both an ITML algorithm that re-fashions the BoW feature space into one that appreciates diversity better, and a measure to evaluate diversity in retrieval results for IR applications. Additionally, we also generate 200 hand-labeled images from the Paris dataset, for use in further research in this area. Experiments on the popular Paris dataset show that our method outperforms the standard BoW model in many cases.
Abstract. Traditional clustering algorithms use a predefined metric and no supervision in identifying the partition. Existing semi-supervised clustering approaches either learn a metric from randomly chosen constraints or actively select informative constraints using a generic distance measure like Euclidean norm. We tackle the problem of identifying constraints that are informative to learn appropriate metric for semi-supervised clustering. We propose an approach to simultaneously find out appropriate constraints and learn a metric to boost the clustering performance. We evaluate clustering quality of our approach using the learned metric on the MNIST handwritten digits, Caltech-256 and MSRC2 object image datasets. Our results on these datasets have significant improvements over the baseline methods like MPCK-MEANS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.