2020
DOI: 10.1007/s11042-020-09983-3
|View full text |Cite
|
Sign up to set email alerts
|

CAESAR: concept augmentation based semantic representation for cross-modal retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

5
1

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 58 publications
0
4
0
Order By: Relevance
“…For example, [34] is the first to integrate the traditional method with a deep CNN model to improve the visual feature learning. Our previous work [49] extended the CCA model by using two branches of CNN model to boost the cross-modal semantic representation. [20] investigate cross-modal correlation learning (CCL) method by using a deep model to mine intra-and inter-modality correlation simultaneously.…”
Section: Related Workmentioning
confidence: 99%
“…For example, [34] is the first to integrate the traditional method with a deep CNN model to improve the visual feature learning. Our previous work [49] extended the CCA model by using two branches of CNN model to boost the cross-modal semantic representation. [20] investigate cross-modal correlation learning (CCL) method by using a deep model to mine intra-and inter-modality correlation simultaneously.…”
Section: Related Workmentioning
confidence: 99%
“…Compared with these traditional techniques, deep neural networks, such as CNN [10], RNN, etc., have stronger semantic representation capability and larger parameter capacity, which are able to capture more high-level semantics from multi-modal content. Other advanced techniques, such as attention mechanism and semantic augmentation [11], are used to realize cross-modal semantic alignment.…”
Section: Introductionmentioning
confidence: 99%
“…As stated in existing literatures [4,28,35,40], the major roadblocks of cross-modal retrieval are two folds, namely (1) cross-modal heterogeneity that hinders the similarity measurement between samples from different modalities, and (2) semantic gap existing between multi-modal data and human understanding, which obstructs the cross-modal semantic alignment. According to the type of representation, cross-modal retrieval techniques can be grouped into two categories, i.e., the real value-based methods [38,48] and binary value-based (hashing) approaches [15,18,43]. As the higher efficiency of retrieval and lower cost of storage, more attention has been paid to cross-modal hashing for big data applications.…”
Section: Introductionmentioning
confidence: 99%