Cross-modal domain adaptation for text-based regularization of image semantics in image retrieval systems

Pereira, J. C. F.; Vasconcelos, Nuno

doi:10.1016/j.cviu.2014.03.003

Cited by 26 publications

(7 citation statements)

References 76 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It should be noted that, the above work is restricted to the case of the combination of two different media types. Pereira's work [25] follows crossmedia idea, and improves the content-based image retrieval by using the contextual information of images. In this work, the training images and texts are first mapped to a common semantic space and then a regularization operator is learned for each concept in the semantic vocabulary.…”

Section: Related Workmentioning

confidence: 98%

Semi-Supervised Cross-Media Feature Learning With Unified Patch Graph Regularization

Peng

Zhai

Zhao

et al. 2016

IEEE Trans. Circuits Syst. Video Technol.

127

View full text Add to dashboard Cite

With the rapid growth of multimedia data such as text, image, video, audio and 3D model, cross-media retrieval has become increasingly important, because users can retrieve the results with various types of media by submitting a query of any media type. Comparing with single-media retrieval such as image retrieval and text retrieval, cross-media retrieval is better because it provides the retrieval results with all kinds of media at the same time. In this paper, we focus on how to learn cross-media features for different media types, which is a key challenge for cross-media retrieval. Existing methods either model different media types separately or only exploit the labeled multimedia data. Actually, data from different media types with the same semantic category are complementary to each other, and jointly modeling them is able to improve the accuracy of cross-media retrieval. In addition, although the labeled data are accurate, they require a lot of human labor and thus are very scarce. To address the above problems, we propose a semi-supervised crossmedia feature learning algorithm with unified patch graph regularization (S 2 UPG). Our motivation and contributions mainly lie in the following three aspects: (1) Existing methods only model different media types in different graphs, while we employ one joint graph to simultaneously model all the media types. The joint graph is able to fully exploit the semantic correlations among various media types, which are complementary to provide the rich hint for cross-media correlation. (2) Existing methods only consider the original media instances (such as images, videos, texts, audios, and 3D models) but ignore their patches, while we make full use of both the media instances and their patches in one graph. Cross-media patches could emphasize the important parts and make cross-media correlations more precise.(3) Traditional semi-supervised learning methods only exploit single-media unlabeled instances, while our approach fully exploits cross-media unlabeled instances and their patches, which can increase the diversity of training data and boost the accuracy of cross-media retrieval. Comparing with the current state-of-theart methods on 3 datasets, including the challenging XMedia dataset with 5 media types, the comprehensive experimental results show that our proposed approach performs better.

show abstract

Section: Related Workmentioning

confidence: 98%

Semi-Supervised Cross-Media Feature Learning With Unified Patch Graph Regularization

Peng

Zhai

Zhao

et al. 2016

IEEE Trans. Circuits Syst. Video Technol.

127

View full text Add to dashboard Cite

show abstract

“…We can consider swap retrieval as a special case of domain adaptation, where the domain shift is actually a change in the object category that we are looking for. Pereira and Vasconcelos [22] for instance apply a cross-modal domain adaptation to the task of image retrieval by considering image and text as source and target domains. In our task, we consider the features corresponding to the object category in the query image and the swapped category as source and target domains.…”

Section: Domain Adaptationmentioning

confidence: 99%

Swap Retrieval

Ghodrati

Jia

Pedersoli

et al. 2015

Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

Query-by-example remains popular in image retrieval because it can exploit contextual information encoded in the image, that is difficult to express in a traditional textual query. Textual queries, on the other hand, give more flexibility in that it's easy to reformulate and refine a text query based on initial results.In this work we make a first step towards getting the best of both worlds: we use an image to specify the context, but let the user specify a related category as main search criterion. For instance, starting from an image of a dog in a certain situation/context, the goal is to find images of cats with a similar situation/context. We present an evaluation scheme for this new and challenging task, which we call swap retrieval, and use it to compare various methods. Results show that standard query-by-example techniques do not adapt well to the new task. Instead, techniques based on semantic knowledge extracted from textual descriptions available at training time perform reasonably well, although they are still far from the performance needed for practical use.

show abstract

“…a phone connection [20,24]. In computer vision, adaptation has been proposed to bridge gap between camera views [8], data modalities [6,16,17], image conditions [22], or even object classes [10]. Recently, model adaptation has been used in the deep learning literature, to adapt a model learned from the Imagenet corpus [9] to other tasks [7].…”

Section: Related Workmentioning

confidence: 99%

Bayesian Model Adaptation for Crowd Counts

Liu

Vasconcelos

2015

2015 IEEE International Conference on Computer Vision (ICCV)

Self Cite

View full text Add to dashboard Cite

The problem of transfer learning is considered in the domain of crowd counting. A solution based on Bayesian model adaptation of Gaussian processes is proposed. This is shown to produce intuitive model updates, which are tractable, and lead to an adapted model (predictive distribution) that accounts for all information in both training and adaptation data. The new adaptation procedure achieves significant gains over previous approaches, based on multi-task learning, while requiring much less computation to deploy. This makes it particularly suited for the problem of expanding the capacity of crowd counting camera networks. A large video dataset for the evaluation of adaptation approaches to crowd counting is also introduced. This contains a number of adaptation tasks, involving information transfer across video collected by 1) a single camera under different scene conditions (different times of the day) and 2) video collected from different cameras. Evaluation of the proposed model adaptation procedure in this dataset shows good performance in realistic operating conditions.

show abstract

Cross-modal domain adaptation for text-based regularization of image semantics in image retrieval systems

Cited by 26 publications

References 76 publications

Semi-Supervised Cross-Media Feature Learning With Unified Patch Graph Regularization

Semi-Supervised Cross-Media Feature Learning With Unified Patch Graph Regularization

Swap Retrieval

Bayesian Model Adaptation for Crowd Counts

Contact Info

Product

Resources

About