Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval

Yan, Hua; Yang, Yingyun; Du, Jianhe

doi:10.3390/electronics9030466

Cited by 3 publications

(1 citation statement)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The parameters of the first convolutional layers are frozen, and the rest of the parameters should be fine-tuned on our self-built dataset. Finally, for the problem of small differences between subclasses and large differences within classes, a loss function based on metric learning [9] is introduced, which is suitable for multidimensional targets. It can target diverse dimensions and enrich the feature information of surface targets to make the neural network converge better and faster.…”

Section: Introductionmentioning

confidence: 99%

Fine-Grained Recognition of Surface Targets with Limited Data

et al. 2020

View full text Add to dashboard Cite

Recognition of surface targets has a vital influence on the development of military and civilian applications such as maritime rescue patrols, illegal-vessel screening, and maritime operation monitoring. However, owing to the interference of visual similarity and environmental variations and the lack of high-quality datasets, accurate recognition of surface targets has always been a challenging task. In this paper, we introduce a multi-attention residual model based on deep learning methods, in which channel and spatial attention modules are applied for feature fusion. In addition, we use transfer learning to improve the feature expression capabilities of the model under conditions of limited data. A function based on metric learning is adopted to increase the distance between different classes. Finally, a dataset with eight types of surface targets is established. Comparative experiments on our self-built dataset show that the proposed method focuses more on discriminative regions, avoiding problems like gradient disappearance, and achieves better classification results than B-CNN, RA-CNN, MAMC, and MA-CNN, DFL-CNN.

show abstract

Section: Introductionmentioning

confidence: 99%

Fine-Grained Recognition of Surface Targets with Limited Data

et al. 2020

View full text Add to dashboard Cite

show abstract

Bi-directional Image–Text Matching Deep Learning-Based Approaches: Concepts, Methodologies, Benchmarks and Challenges

Ebaid

Madbouly

El-Zoghabi

2023

Int J Comput Intell Syst

View full text Add to dashboard Cite

Nowadays, image–text matching (retrieval) has frequently attracted attention due to the growth of multimodal data. This task returns the relevant images to a textual query or descriptions that describe a visual scene and vice versa. The core challenge is how to precisely determine the similarity computation between the text and image, which requires understanding the different modalities by extracting the related information accurately. Although many approaches are established for matching textual data and visual content utilizing deep learning (DL) approaches, a few reviews of the studies of image–text matching are obtainable using DL. In this review study, we contribute to present and clarify the modern techniques based on DL in the image–text matching problem by providing an extensive study of the existing matching models, different current architectures, benchmark datasets, and evaluation methods. First, we explain the matching task and illustrate frequently used architecture. Second, we classify present approaches according to two important concepts the alignment between image and text, and the learning approach. Third, we report standard datasets and evaluation techniques. Finally, we show up current challenges to serve as an inspiration to new researchers in this field.

show abstract