Supervised Hierarchical Deep Hashing for Cross-Modal Retrieval

Zhan, Yu-Wei; Luo, Xin; Wang, Yongxin; Xu, Xin-Shun

doi:10.1145/3394171.3413962

Cited by 38 publications

(12 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…e Input module, Generalization module, Output module, and Response module can use any existing algorithm in the field of machine learning, such as SVM and random forest [25]. e working process of each of the four modules is introduced, respectively.…”

Section: Memory Networkmentioning

confidence: 99%

A Hierarchical Network with User Memory Matrix for Long Sequence Recommendation

Dong

Sun

et al. 2022

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

In many recommendation scenarios, the interactions between users and items are divided into a series of sessions according to the time interval. The traditional Recurrent Neural Network has some shortcomings, such as limited memory ability, inflexible access to memory data, and obvious deficiency in feature capture for long sequences. To deal with the mentioned issues, we propose a hierarchical network with user memory matrix, named HNUM2, which utilizes the memory network to store users' long-term and short-term interests. The memory network is more flexible to access memory data, which can solve the problem of insufficient capture of long sequence features. The proposed model is a hierarchical recommendation algorithm, which consists of two layers. The first layer is the session-level GRU model, which obtains the sequence characteristics of the current session to predict the next item. The second layer is the user-level memory network model which exploits the attention mechanism and incorporates the write module and read module. The experimental results on two public available datasets show that HNUM2 has achieved significant performance improvement comparing to the state-of-the-art methods.

show abstract

Section: Memory Networkmentioning

confidence: 99%

A Hierarchical Network with User Memory Matrix for Long Sequence Recommendation

Dong

Sun

et al. 2022

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

show abstract

“…Therefore, some people use text or other modality queries to express the search intention and the task of cross-modal retrieval (especially using text to retrieve images) has emerged. Cross-modal retrieval focuses on mapping different modalities to the same semantic space, and uses supervision information to guide the alignment of images and texts [6] [17] [26] [33] [15] [34] [37] . However, the information conveyed by the text is very abstract and sparse, which makes cross-modal retrieval very difficult and the application scenarios are not extensive.…”

Section: Related Work 21 Image Retrievalmentioning

confidence: 99%

Cross-modal Joint Prediction and Alignment for Composed Query Image Retrieval

Yang

Wang²,

Zhou

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

In this paper, we focus on the composed query image retrieval task, namely retrieving the target images that are similar to a composed query, in which a modification text is combined with a query image to describe a user's accurate search intention. Previous methods usually focus on learning the joint image-text representations, but rarely consider the intrinsic relationship among the query image, the target image and the modification text. To address this problem, we propose a new cross-modal joint prediction and alignment framework for composed query image retrieval. In our framework, the modification text is regarded as an implicit transformation between the query image and the target image. Motivated by that, not only the combination of the query image and modification text should be similar to the target image, but also the modification text should be predicted according to the query image and the target image. We devote to aligning this relationship by a novel Joint Prediction Module (JPM). Our proposed framework can seamlessly incorporate the JPM into the existing methods to effectively improve the discrimination and robustness of visual and textual representations. The experiments on three public datasets demonstrate the effectiveness of our proposed framework, proving that our proposed JPM can be simply incorporated with the existing methods while effectively improving the performance.

show abstract

“…Deep Saliency Hashing (DSaH) [46] is a two-step end-to-end model, which mines salient regions and learns semantic-preserving hash codes simultaneously. Supervised Hierarchical Deep Cross-modal Hashing (SHDCH) [47] learns the hash codes by explicitly delving into the hierarchical labels. Deep Semantic cross-modal hashing with Correlation Alignment (DSCA) [48] designs two deep neural networks for image and text modality separately, and learns two hash functions.…”

Section: Deep Hashingmentioning

confidence: 99%

Discrete Semantics-Guided Asymmetric Hashing for Large-Scale Multimedia Retrieval

et al. 2021

View full text Add to dashboard Cite

Cross-modal hashing technology is a key technology for real-time retrieval of large-scale multimedia data in real-world applications. Although the existing cross-modal hashing methods have achieved impressive accomplishment, there are still some limitations: (1) some cross-modal hashing methods do not make full consider the rich semantic information and noise information in labels, resulting in a large semantic gap, and (2) some cross-modal hashing methods adopt the relaxation-based or discrete cyclic coordinate descent algorithm to solve the discrete constraint problem, resulting in a large quantization error or time consumption. Therefore, in order to solve these limitations, in this paper, we propose a novel method, named Discrete Semantics-Guided Asymmetric Hashing (DSAH). Specifically, our proposed DSAH leverages both label information and similarity matrix to enhance the semantic information of the learned hash codes, and the ℓ2,1 norm is used to increase the sparsity of matrix to solve the problem of the inevitable noise and subjective factors in labels. Meanwhile, an asymmetric hash learning scheme is proposed to efficiently perform hash learning. In addition, a discrete optimization algorithm is proposed to fast solve the hash code directly and discretely. During the optimization process, the hash code learning and the hash function learning interact, i.e., the learned hash codes can guide the learning process of the hash function and the hash function can also guide the hash code generation simultaneously. Extensive experiments performed on two benchmark datasets highlight the superiority of DSAH over several state-of-the-art methods.

show abstract

Supervised Hierarchical Deep Hashing for Cross-Modal Retrieval

Cited by 38 publications

References 34 publications

A Hierarchical Network with User Memory Matrix for Long Sequence Recommendation

A Hierarchical Network with User Memory Matrix for Long Sequence Recommendation

Cross-modal Joint Prediction and Alignment for Composed Query Image Retrieval

Discrete Semantics-Guided Asymmetric Hashing for Large-Scale Multimedia Retrieval

Contact Info

Product

Resources

About