2022
DOI: 10.3390/s22186816
|View full text |Cite
|
Sign up to set email alerts
|

A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint

Abstract: For decades, co-relating different data domains to attain the maximum potential of machines has driven research, especially in neural networks. Similarly, text and visual data (images and videos) are two distinct data domains with extensive research in the past. Recently, using natural language to process 2D or 3D images and videos with the immense power of neural nets has witnessed a promising future. Despite the diverse range of remarkable work in this field, notably in the past few years, rapid improvements… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 451 publications
(462 reference statements)
0
4
0
Order By: Relevance
“…Evaluation methods and metrics are needed to determine the validity of auto-generated captions [63,67]. Popular evaluation metrics are shown in Table 3, but more extensive reviews currently exist in the literature [63,87].…”
Section: Text Evaluation Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Evaluation methods and metrics are needed to determine the validity of auto-generated captions [63,67]. Popular evaluation metrics are shown in Table 3, but more extensive reviews currently exist in the literature [63,87].…”
Section: Text Evaluation Methodsmentioning
confidence: 99%
“…Evaluation methods and metrics are needed to determine the validity of auto-generated captions [63,67]. Popular evaluation metrics are shown in Table 3, but more extensive reviews currently exist in the literature [63,87]. The MS COCO Dataset Challenge uses BLEU, ROUGE, METEOR, CIDEr, and SPICE to evaluate performance, so these have become the status quo for evaluating the similarity between texts [74].…”
Section: Text Evaluation Methodsmentioning
confidence: 99%
See 2 more Smart Citations