Deep Relation Embedding for Cross-Modal Retrieval

Zhang, Yifan; Zhou, Wengang; Wang, Min; Li, Houqiang

doi:10.1109/tip.2020.3038354

Cited by 30 publications

(13 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For MS-COCO dataset, the part of the description in the text is too ambiguous, which may cause the model to be insensitive to the features in this part and leads to poor results. But it is comparable to some other models, such as CPGN [15], CAMP [10], and PVSE [12].…”

Section: Quantitative Resultssupporting

confidence: 54%

“…Sentence Retrieval Image Retrieval rSum R@1 R@5 R@10 R@1 R@5 R@10 CPGN [15] 70.5 91.2 94.9 50.3 77.7 85.2 469.8 IMRAM [18] 74.1 93.0 96.6 53.9 79.4 87.2 484.2 CAAN [17] 70.1 91.6 97.2 52.8 79.0 87.9 478.6 VSRN [13] 71.3 90.6 96.0 54.7 81.8 88.2 482.6 CAMP [10] 68.1 89.7 95.2 51.5 77.1 85.3 466.9 TERAN(single) [5] 75.8 93.2 96.7 59.5 84.9 90.6 500.7 TERAN(ens.) [5]…”

Section: Methodsmentioning

confidence: 99%

“…Although this approach can extract high-level semantic information, it has no emphasis on the process of mixing information and does not work well for local matching. Subsequently, [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23] focus on the extraction * corresponding author. of local features from images and text , and combining attention mechanisms to achieve local alignment.…”

Section: Introductionmentioning

confidence: 99%

“…However, region features extracted from the image are relatively independent without contextual semantic information as a guarantee, and its semantics will inevitably deviate in the subsequent extraction of region features. Therefore, [11,15] and other approaches propose to fuse the local and global information of the data. [19] uses global features in another form and divided the similarity calculation into 3 levels, local-level, global-level, and relationship-level, to measure data similarity at multiple levels.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Relation-Guided Network for Image-Text Retrieval

Yang

Shen

Yang

2022

2022 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

Image-text retrieval has made great progress, but it remains challenging due to heterogeneity between images and text. Enhancing the interaction by exploring the relationship between the image and text can reduce this problem, to some extent. How to explore and use the relationship between image and text to enhance the interaction between them is a critical problem. In this paper, we design an asymmetric structure network (RGN) to represent image and text. First, we mine the relationship between image and text, and extract the specific text information. Then we exploit this relationship to guide the generation of text embeddings, which can capture the rich and representative embeddings. Results on two datasets, Flickr30K dataset and MSCOCO dataset, show that our model can achieve competitive results.

show abstract

Section: Quantitative Resultssupporting

confidence: 54%

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Relation-Guided Network for Image-Text Retrieval

Yang

Shen

Yang

2022

2022 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

show abstract

“…The core idea of the cross‐media search method based on deep semantics is to learn the complex high‐level features by using deep learning methods to improve the effect of feature learning. Zhang et al 22 incorporate image and text into the latent common space using the Residual Network (ResNet) model to learn the global feature for sentence generation learning. Peng and Qi 23 used bidirectional translation training to directly convert the bidirectional pairs between visual and textual descriptions to capture the cross‐media correlations.…”

Section: Related Workmentioning

confidence: 99%

Cross‐media search method based on complementary attention and generative adversarial network for social networks

Shi

Cheng

et al. 2021

Int J of Intelligent Sys

View full text Add to dashboard Cite

The rapid development of the social network has brought great convenience to people's lives. A large amount of cross-media big data, such as text, image, and video data, has been accumulated. A cross-media search can facilitate a quick query of information so that users can obtain helpful content for social networks. However, cross-media data suffer from semantic gaps and sparsity in social networks, which bring challenges to cross-media searches. To alleviate the semantic gaps and sparsity, we propose a crossmedia search method based on complementary attention and generative adversarial networks (CAGS).To obtain high-quality feature representations, we build a complementary attention mechanism containing the focused and unfocused features of images to realize the consistent association of cross-media data in social networks. By designing the cross-media

show abstract