Content-based copy retrieval (CBCR) aims at retrieving in a database all the modified versions or the previous versions of a given candidate object. In this paper, we present a copy retrieval scheme based on local features that can deal with very large databases both in terms of quality and speed. We first propose a new approximate similarity search technique in which the probabilistic selection of the feature space regions is not based on the distribution in the database but on the distribution of the features distortion. Since our CBCR framework is based on local features, the approximation can be strong and reduce drastically the amount of data to explore. Furthermore, we show how the discrimination of the global retrieval can be enhanced during its post-processing step, by considering only the geometrically consistent matches. This framework is applied to robust video copy retrieval and extensive experiments are presented to study the interactions between the approximate search and the retrieval efficiency. Largest used database contains more than one billion local features corresponding to 30, 000 hours of video.
BackgroundHundreds of herbarium collections have accumulated a valuable heritage and knowledge of plants over several centuries. Recent initiatives started ambitious preservation plans to digitize this information and make it available to botanists and the general public through web portals. However, thousands of sheets are still unidentified at the species level while numerous sheets should be reviewed and updated following more recent taxonomic knowledge. These annotations and revisions require an unrealistic amount of work for botanists to carry out in a reasonable time. Computer vision and machine learning approaches applied to herbarium sheets are promising but are still not well studied compared to automated species identification from leaf scans or pictures of plants in the field.ResultsIn this work, we propose to study and evaluate the accuracy with which herbarium images can be potentially exploited for species identification with deep learning technology. In addition, we propose to study if the combination of herbarium sheets with photos of plants in the field is relevant in terms of accuracy, and finally, we explore if herbarium images from one region that has one specific flora can be used to do transfer learning to another region with other species; for example, on a region under-represented in terms of collected data.ConclusionsThis is, to our knowledge, the first study that uses deep learning to analyze a big dataset with thousands of species from herbaria. Results show the potential of Deep Learning on herbarium species identification, particularly by training and testing across different datasets from different herbaria. This could potentially lead to the creation of a semi, or even fully automated system to help taxonomists and experts with their annotation, classification, and revision works.
This paper presents a new content-based retrieval framework applied to logo retrieval in large natural image collections 1 . The first contribution is a new challenging dataset, called BelgaLogos, which was created in collaboration with professionals of a press agency, in order to evaluate logo retrieval technologies in real-world scenarios. The second and main contribution is a new visual query expansion method using an a contrario thresholding strategy in order to improve the accuracy of expanded query images. Whereas previous methods based on the same paradigm used a purely hand tuned fixed threshold, we provide a fully adaptive method enhancing both genericity and effectiveness. This new technique is evaluated on both OxfordBuilding dataset and our new BelgaLogos dataset.
Abstract. The ImageCLEF's plant identification task provides a testbed for the system-oriented evaluation of plant identification, more precisely on the 126 tree species identification based on leaf images. Three types of image content are considered: Scan, Scan-like (leaf photographs with a white uniform background), and Photograph (unconstrained leaf with natural background). The main originality of this data is that it was specifically built through a citizen sciences initiative conducted by Tela Botanica, a French social network of amateur and expert botanists. This makes the task closer to the conditions of a real-world application. This overview presents more precisely the resources and assessments of task, summarizes the retrieval approaches employed by the participating groups, and provides an analysis of the main evaluation results. With a total of eleven groups from eight countries and with a total of 30 runs submitted, involving distinct and original methods, this second year pilot task confirms Image Retrieval community interest for biodiversity and botany, and highlights further challenging studies in plant identification.
Efficient high-dimensional similarity search structures are essential for building scalable content-based search systems on feature-rich multimedia data. In the last decade, Locality Sensitive Hashing (LSH) has been proposed as indexing technique for approximate similarity search. Among the most recent variations of LSH, multiprobe LSH techniques have been proved to overcome the overlinear space cost drawback of common LSH. Multi-probe LSH is built on the well-known LSH technique, but it intelligently probes multiple buckets that are likely to contain query results in a hash table. Our method is inspired by previous work on probabilistic similarity search structures and improves upon recent theoretical work on multi-probe and query adaptive LSH. Whereas these methods are based on likelihood criteria that a given bucket contains query results, we define a more reliable a posteriori model taking account some prior about the queries and the searched objects. This prior knowledge allows a better quality control of the search and a more accurate selection of the most probable buckets. We implemented a nearest neighbors search based on this paradigm and performed experiments on different real visual features datasets. We show that our a posteriori scheme outperforms other multi-probe LSH while offering a better quality control. Comparisons to the basic LSH technique show that our method allows consistent improvements both in space and time efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.