2022
DOI: 10.3390/electronics11192999
|View full text |Cite
|
Sign up to set email alerts
|

MFVC: Urban Traffic Scene Video Caption Based on Multimodal Fusion

Abstract: With the development of electronic technology, intelligent cars can gradually realize more complex artificial intelligence algorithms. The video caption algorithm is one of them. However, current video caption algorithms only consider single-visual information when applied to urban traffic scenes, which leads to the failure to generate accurate captions of complex sets. The multimodal fusion algorithm based on Transformer is one of the solutions to this problem. However, the existing algorithms have the diffic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 25 publications
(27 reference statements)
0
0
0
Order By: Relevance
“…Therefore, we propose converting traffic scene keyframes into natural language captions and using richer semantic information can replace detecting individual entities. This approach shows promise for assisting visually impaired individuals [12,23], driving safety [1], and describing traffic accidents [18].…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, we propose converting traffic scene keyframes into natural language captions and using richer semantic information can replace detecting individual entities. This approach shows promise for assisting visually impaired individuals [12,23], driving safety [1], and describing traffic accidents [18].…”
Section: Introductionmentioning
confidence: 99%