Multi-modal Mutual Topic Reinforce Modeling for Cross-media Retrieval

Wang, Yanfei; Wu, Fei; Song, Jun; Li, Xi; Zhuang, Yueting

doi:10.1145/2647868.2654901

Cited by 45 publications

(33 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, Roller and Schulte im Walde (2013) integrated visual features into latent Dirichlet allocation (LDA) and proposed a multimodal LDA model to learn representations for textual and visual data. Wang Y et al (2014) proposed a scheme called the multimodal mutual topic reinforce model (M 3 R), which seeks to discover mutually consistent semantic topics via appropriate interactions between model factors. These schemes represent data as topic distributions, and similarities are measured by the likelihood of observed data in terms of latent topics.…”

Section: Theory and Model For Cross-media Uniform Representationmentioning

confidence: 99%

Cross-media analysis and reasoning: advances and directions

Peng

Zhu

Zhao

et al. 2017

Frontiers Inf Technol Electronic Eng

View full text Add to dashboard Cite

Abstract:Cross-media analysis and reasoning is an active research area in computer science, and a promising direction for artificial intelligence. However, to the best of our knowledge, no existing work has summarized the state-of-the-art methods for cross-media analysis and reasoning or presented advances, challenges, and future directions for the field. To address these issues, we provide an overview as follows: (1) theory and model for cross-media uniform representation; (2) cross-media correlation understanding and deep mining; (3) cross-media knowledge graph construction and learning methodologies; (4) cross-media knowledge evolution and reasoning; (5) cross-media description and generation; (6) cross-media intelligent engines; and (7) cross-media intelligent applications. By presenting approaches, advances, and future directions in cross-media analysis and reasoning, our goal is not only to draw more attention to the state-of-the-art advances in the field, but also to provide technical insights by discussing the challenges and research directions in these areas.

show abstract

Section: Theory and Model For Cross-media Uniform Representationmentioning

confidence: 99%

Cross-media analysis and reasoning: advances and directions

Peng

Zhu

Zhao

et al. 2017

Frontiers Inf Technol Electronic Eng

View full text Add to dashboard Cite

show abstract

“…An example is the well-known Latent Dirichlet Allocation (LDA). In [32], a supervised multimodal mutual topic reinforce modeling approach for cross-media retrieval, called M3R, is proposed. Some other methodologies are Partial Least Squares (PLS) and correlation matching.…”

Section: Related Workmentioning

confidence: 99%

Multimedia retrieval based on non-linear graph-based fusion and partial least squares regression

Gialampoukidis

Moumtzidou

Liparas

et al. 2017

Multimed Tools Appl

View full text Add to dashboard Cite

Heterogeneous sources of information, such as images, videos, text and metadata are often used to describe different or complementary views of the same multimedia object, especially in the online news domain and in large annotated image collections. The retrieval of multimedia objects, given a multimodal query, requires the combination of several sources of information in an efficient and scalable way. Towards this direction, we provide a novel unsupervised framework for multimodal fusion of visual and textual similarities, which are based on visual features, visual concepts and textual metadata, integrating non-linear graph-based fusion and Partial Least Squares Regression. The fusion strategy is based on the construction of a multimodal contextual similarity matrix and the non-linear combination of relevance scores from query-based similarity vectors. Our framework can employ more than two modalities and high-level information, without increase in memory complexity, when compared to state-of-the-art baseline methods. The experimental comparison is done in three public multimedia collections in the multimedia retrieval task. The results have shown that the proposed method outperforms the baseline methods, in terms of Mean Average Precision and Precision@20.

show abstract

“…However, CCA-based methods lack probabilistic interpretation on the intra-modal similarities. Topic models [3,12,25] learn latent topics to describe the intrinsic semantic correlations in multi-modal data. Based on Latent Dirichlet Allocation (LDA) [3], a variety of constraints are imposed.…”

Section: Introductionmentioning

confidence: 99%

“…For example, CCA-based models [1,11,19,20] assume that the inter-modal relation is expressed by co-occurrence of multi-modal data objects. The inter-modal relation is also encoded as binary observation matrix to be fit by the correlation models [3,12,25,27]. By contrast, we directly impose two kinds of inter-modal relations (i.e., semantic similarity and dissimilarity) as smooth priors on the output of multi-modal GPLVM.…”

Section: Introductionmentioning

confidence: 99%

Similarity Gaussian Process Latent Variable Model for Multi-modal Data Analysis

Song¹,

Wang²,

Huang³

2015

2015 IEEE International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Data from real applications involve multiple modalities representing content with the same semantics and deliver rich information from complementary aspects. However, relations among heterogeneous modalities are simply treated as observation-to-fit by existing work, and the parameterized cross-modal mapping functions lack flexibility in directly adapting to the content divergence and semantic complicacy of multi-modal data. In this paper, we build our work based on Gaussian process latent variable model (GPLVM) to learn the non-linear non-parametric mapping functions and transform heterogeneous data into a shared latent space. We propose multi-modal Similarity Gaussian Process latent variable model (m-SimGP), which learns the nonlinear mapping functions between the intra-modal similarities and latent representation. We further propose multimodal regularized similarity GPLVM (m-RSimGP) by encouraging similar/dissimilar points to be similar/dissimilar in the output space. The overall objective functions are solved by simple and scalable gradient decent techniques. The proposed models are robust to content divergence and high-dimensionality in multi-modal representation. They can be applied to various tasks to discover the non-linear correlations and obtain the comparable low-dimensional representation for heterogeneous modalities. On two widely used real-world datasets, we outperform previous approaches for cross-modal content retrieval and cross-modal classification.

show abstract

Multi-modal Mutual Topic Reinforce Modeling for Cross-media Retrieval

Cited by 45 publications

References 21 publications

Cross-media analysis and reasoning: advances and directions

Cross-media analysis and reasoning: advances and directions

Multimedia retrieval based on non-linear graph-based fusion and partial least squares regression

Similarity Gaussian Process Latent Variable Model for Multi-modal Data Analysis

Contact Info

Product

Resources

About