Fine-Tuning CNN Image Retrieval with No Human Annotation

Radenović, Filip; Tolias, Giorgos; Chum, Ondřej

doi:10.1109/tpami.2018.2846566

Cited by 1,040 publications

(1,177 citation statements)

References 65 publications

Supporting

Mentioning

1,090

Contrasting

Unclassified

Order By: Relevance

“…They also propose the regional version RMAC, by sampling windows at different scales and describing them separately. Radenović et al [19] generalize the preceding approaches with a generalized mean pooling (GeM) including a learnable parameter.…”

Section: Global Methodsmentioning

confidence: 99%

Challenging Deep Image Descriptors for Retrieval in Heterogeneous Iconographic Collections

Gominski

Porêba²,

Gouet-Brunet³

et al. 2019

Proceedings of the 1st Workshop on Structuring and Understanding of Multimedia heritAge Contents

View full text Add to dashboard Cite

This article proposes to study the behavior of recent and efficient state-of-the-art deep-learning based image descriptors for contentbased image retrieval, facing a panel of complex variations appearing in heterogeneous image datasets, in particular in cultural collections that may involve multi-source, multi-date and multi-view Permission to make digital contents. For this purpose, we introduce a novel dataset, namely Alegoria dataset, consisting of 12,952 iconographic contents representing landscapes of the French territory, and encapsultating a large range of intra-class variations of appearance which were finely labelled. Six deep features (DELF, NetVLAD, GeM, MAC, RMAC, SPoC) and a hand-crafted local descriptor (ORB) are evaluated against these variations. Their performance are discussed, with the objective of providing the reader with research directions for improving image description techniques dedicated to complex heterogeneous datasets that are now increasingly present in topical applications targeting heritage valorization.

show abstract

Section: Global Methodsmentioning

confidence: 99%

Challenging Deep Image Descriptors for Retrieval in Heterogeneous Iconographic Collections

Gominski

Porêba²,

Gouet-Brunet³

et al. 2019

Proceedings of the 1st Workshop on Structuring and Understanding of Multimedia heritAge Contents

View full text Add to dashboard Cite

show abstract

“…expectation maximization (EM) [33] , curriculum learning [34] , self-paced learning [35] , etc. ) are widely used in the weakly-supervised tasks [9,[36][37][38][39][40][41] . For example, [36] adopts the expectation maximization (EM) algorithm to dynamically predict semantic foreground and background pixels by using an alternative training procedure.…”

Section: Iterative Learning Methodsmentioning

confidence: 99%

Weakly-supervised object detection via mining pseudo ground truth bounding-boxes

Zhang

Bai

Ding

et al. 2018

Pattern Recognition

View full text Add to dashboard Cite

a b s t r a c tRecently, weakly-supervised object detection has attracted much attention, since it does not require expensive bounding-box annotations while training the network. Although significant progress has also been made, there is still a large gap on the performance between weakly-supervised and fully-supervised object detection. To mitigate this gap, some works try to use the pseudo ground truths generated by a weakly-supervised detector to train a supervised detector. However, such approaches incline to find the most representative parts instead of the whole body of an object, and only seek one ground truth bounding-box per class even though many same-class instances exist in an image. To address these issues, we propose a weakly-supervised to fully-supervised framework (W2F), where a weakly-supervised detector is implemented using multiple instance learning. And then, we propose a pseudo ground-truth excavation (PGE) algorithm to find the accurate pseudo ground truth bounding-box for each instance. Moreover, the pseudo ground-truth adaptation (PGA) algorithm is designed to further refine those pseudo ground truths mined by PGE algorithm. Finally, the mined pseudo ground truths are used as supervision to train a fully-supervised detector. Additionally, we also propose an iterative ground-truth learning (IGL) approach, which enhances the quality of the pseudo ground truths by using the predictions of the fullysupervised detector iteratively. Extensive experiments on the challenging PASCAL VOC 2007 and 2012 benchmarks strongly demonstrate the effectiveness of our method. We obtain 53.1% and 49.4% mAP on VOC2007 and VOC2012 respectively, which is a significant improvement over previous state-of-the-art methods.

show abstract

“…In Table 7, we present the results in ROxford and RParis datasets of stateof-the-art methods which uses VGG as feature extractor. In the pre-trained single pass category we improve the state-of-the-art performance with ChCO-SC T + SpT D based in the linear aggregation of co-occurrences against well known image retrieval methods like crow [7], SPoC [5], MAC and R-MAC [6] and GeM [31]. Moreover, with bilinear pooling we can obtain a final vector representation with higher dimensions than the number of channels of the last VGG layer.…”

Section: Comparison With State-of-the-art Resultsmentioning

confidence: 99%

“…In this section is evaluated the co-occurrence representation after the cooccurrence filter training process in a CoOcNET pipeline. The evaluation is performed similar to [31], with its same whitening procedure, alpha query expansion method αQE, and testing also each query in multiscale, ms, (1, 1…”

Section: Discussionmentioning

confidence: 99%